1. 28 6月, 2013 1 次提交
    • D
      xfs: don't do IO when creating an new inode · cca9f93a
      Dave Chinner 提交于
      When we are allocating a new inode, we read the inode cluster off
      disk to increment the generation number. We are already using a
      random generation number for newly allocated inodes, so if we are not
      using the ikeep mode, we can just generate a new generation number
      when we initialise the newly allocated inode.
      
      This avoids the need for reading the inode buffer during inode
      creation. This will speed up allocation of inodes in cold, partially
      allocated clusters as they will no longer need to be read from disk
      during allocation. It will also reduce the CPU overhead of inode
      allocation by not having the process the buffer read, even on cache
      hits.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      cca9f93a
  2. 06 6月, 2013 1 次提交
  3. 08 5月, 2013 1 次提交
    • D
      xfs: introduce CONFIG_XFS_WARN · 742ae1e3
      Dave Chinner 提交于
      Running a CONFIG_XFS_DEBUG kernel in production environments is not
      the best idea as it introduces significant overhead, can change
      the behaviour of algorithms (such as allocation) to improve test
      coverage, and (most importantly) panic the machine on non-fatal
      errors.
      
      There are many cases where all we want to do is run a
      kernel with more bounds checking enabled, such as is provided by the
      ASSERT() statements throughout the code, but without all the
      potential overhead and drawbacks.
      
      This patch converts all the ASSERT statements to evaluate as
      WARN_ON(1) statements and hence if they fail dump a warning and a
      stack trace to the log. This has minimal overhead and does not
      change any algorithms, and will allow us to find strange "out of
      bounds" problems more easily on production machines.
      
      There are a few places where assert statements contain debug only
      code. These are converted to be debug-or-warn only code so that we
      still get all the assert checks in the code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      742ae1e3
  4. 22 4月, 2013 2 次提交
    • C
      xfs: add version 3 inode format with CRCs · 93848a99
      Christoph Hellwig 提交于
      Add a new inode version with a larger core.  The primary objective is
      to allow for a crc of the inode, and location information (uuid and ino)
      to verify it was written in the right place.  We also extend it by:
      
      	a creation time (for Samba);
      	a changecount (for NFSv4);
      	a flush sequence (in LSN format for recovery);
      	an additional inode flags field; and
      	some additional padding.
      
      These additional fields are not implemented yet, but already laid
      out in the structure.
      
      [dchinner@redhat.com] Added LSN and flags field, some factoring and rework to
      capture all the necessary information in the crc calculation.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      93848a99
    • C
      xfs: add support for large btree blocks · ee1a47ab
      Christoph Hellwig 提交于
      Add support for larger btree blocks that contains a CRC32C checksum,
      a filesystem uuid and block number for detecting filesystem
      consistency and out of place writes.
      
      [dchinner@redhat.com] Also include an owner field to allow reverse
      mappings to be implemented for improved repairability and a LSN
      field to so that log recovery can easily determine the last
      modification that made it to disk for each buffer.
      
      [dchinner@redhat.com] Add buffer log format flags to indicate the
      type of buffer to recovery so that we don't have to do blind magic
      number tests to determine what the buffer is.
      
      [dchinner@redhat.com] Modified to fit into the verifier structure.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ee1a47ab
  5. 18 12月, 2012 1 次提交
  6. 16 11月, 2012 6 次提交
    • D
      xfs: convert buffer verifiers to an ops structure. · 1813dd64
      Dave Chinner 提交于
      To separate the verifiers from iodone functions and associate read
      and write verifiers at the same time, introduce a buffer verifier
      operations structure to the xfs_buf.
      
      This avoids the need for assigning the write verifier, clearing the
      iodone function and re-running ioend processing in the read
      verifier, and gets rid of the nasty "b_pre_io" name for the write
      verifier function pointer. If we ever need to, it will also be
      easier to add further content specific callbacks to a buffer with an
      ops structure in place.
      
      We also avoid needing to export verifier functions, instead we
      can simply export the ops structures for those that are needed
      outside the function they are defined in.
      
      This patch also fixes a directory block readahead verifier issue
      it exposed.
      
      This patch also adds ops callbacks to the inode/alloc btree blocks
      initialised by growfs. These will need more work before they will
      work with CRCs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1813dd64
    • D
      xfs: connect up write verifiers to new buffers · b0f539de
      Dave Chinner 提交于
      Metadata buffers that are read from disk have write verifiers
      already attached to them, but newly allocated buffers do not. Add
      appropriate write verifiers to all new metadata buffers.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b0f539de
    • D
      xfs: add pre-write metadata buffer verifier callbacks · 612cfbfe
      Dave Chinner 提交于
      These verifiers are essentially the same code as the read verifiers,
      but do not require ioend processing. Hence factor the read verifier
      functions and add a new write verifier wrapper that is used as the
      callback.
      
      This is done as one large patch for all verifiers rather than one
      patch per verifier as the change is largely mechanical. This
      includes hooking up the write verifier via the read verifier
      function.
      
      Hooking up the write verifier for buffers obtained via
      xfs_trans_get_buf() will be done in a separate patch as that touches
      code in many different places rather than just the verifier
      functions.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      612cfbfe
    • D
      xfs: verify btree blocks as they are read from disk · 3d3e6f64
      Dave Chinner 提交于
      Add an btree block verify callback function and pass it into the
      buffer read functions. Because each different btree block type
      requires different verification, add a function to the ops structure
      that is called from the generic code.
      
      Also, propagate the verification callback functions through the
      readahead functions, and into the external bmap and bulkstat inode
      readahead code that uses the generic btree buffer read functions.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3d3e6f64
    • D
      xfs: verify inode buffers as they are read from disk · af133e86
      Dave Chinner 提交于
      Add an inode buffer verify callback function and pass it into the
      buffer read functions. Inodes are special in that the verbose checks
      will be done when reading the inode, but we still need to sanity
      check the buffer when that is first read. Always verify the magic
      numbers in all inodes in the buffer, rather than jus ton debug
      kernels.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      af133e86
    • D
      xfs: make buffer read verication an IO completion function · c3f8fc73
      Dave Chinner 提交于
      Add a verifier function callback capability to the buffer read
      interfaces.  This will be used by the callers to supply a function
      that verifies the contents of the buffer when it is read from disk.
      This patch does not provide callback functions, but simply modifies
      the interfaces to allow them to be called.
      
      The reason for adding this to the read interfaces is that it is very
      difficult to tell fom the outside is a buffer was just read from
      disk or whether we just pulled it out of cache. Supplying a callbck
      allows the buffer cache to use it's internal knowledge of the buffer
      to execute it only when the buffer is read from disk.
      
      It is intended that the verifier functions will mark the buffer with
      an EFSCORRUPTED error when verification fails. This allows the
      reading context to distinguish a verification error from an IO
      error, and potentially take further actions on the buffer (e.g.
      attempt repair) based on the error reported.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c3f8fc73
  7. 09 11月, 2012 2 次提交
  8. 08 11月, 2012 1 次提交
  9. 03 11月, 2012 1 次提交
    • C
      xfs: Update inode alloc comments · cd856db6
      Carlos Maiolino 提交于
      I found some out of date comments while studying the inode allocation
      code, so I believe it's worth to have these comments updated.
      
      It basically rewrites the comment regarding to "call_again" variable,
      which is not used anymore, but instead, callers of xfs_ialloc() decides
      if it needs to be called again relying only if ialloc_context is NULL or
      not.
      
      Also did some small changes in another comment that I thought to be
      pertinent to the current behaviour of these functions and some alignment
      on both comments.
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      cd856db6
  10. 18 10月, 2012 2 次提交
  11. 30 7月, 2012 2 次提交
  12. 22 7月, 2012 2 次提交
  13. 15 6月, 2012 1 次提交
  14. 15 5月, 2012 8 次提交
    • D
      xfs: make XBF_MAPPED the default behaviour · 611c9946
      Dave Chinner 提交于
      Rather than specifying XBF_MAPPED for almost all buffers, introduce
      XBF_UNMAPPED for the couple of users that use unmapped buffers.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      611c9946
    • D
      xfs: clean up xfs_bit.h includes · ad1e95c5
      Dave Chinner 提交于
      With the removal of xfs_rw.h and other changes over time, xfs_bit.h
      is being included in many files that don't actually need it. Clean
      up the includes as necessary.
      
      Also move the only-used-once xfs_ialloc_find_free() static inline
      function out of a header file that is widely included to reduce
      the number of needless dependencies on xfs_bit.h.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ad1e95c5
    • D
      xfs: move xfs_get_extsz_hint() and kill xfs_rw.h · 2a0ec1d9
      Dave Chinner 提交于
      The only thing left in xfs_rw.h is a function prototype for an inode
      function.  Move that to xfs_inode.h, and kill xfs_rw.h.
      
      Also move the function implementing the prototype from xfs_rw.c to
      xfs_inode.c so we only have one function left in xfs_rw.c
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      2a0ec1d9
    • D
      xfs: kill XBF_LOCK · a8acad70
      Dave Chinner 提交于
      Buffers are always returned locked from the lookup routines. Hence
      we don't need to tell the lookup routines to return locked buffers,
      on to try and lock them. Remove XBF_LOCK from all the callers and
      from internal buffer cache usage.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a8acad70
    • D
      xfs: pass shutdown method into xfs_trans_ail_delete_bulk · 04913fdd
      Dave Chinner 提交于
      xfs_trans_ail_delete_bulk() can be called from different contexts so
      if the item is not in the AIL we need different shutdown for each
      context.  Pass in the shutdown method needed so the correct action
      can be taken.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      04913fdd
    • C
      xfs: on-stack delayed write buffer lists · 43ff2122
      Christoph Hellwig 提交于
      Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
      and write back the buffers per-process instead of by waking up xfsbufd.
      
      This is now easily doable given that we have very few places left that write
      delwri buffers:
      
       - log recovery:
      	Only done at mount time, and already forcing out the buffers
      	synchronously using xfs_flush_buftarg
      
       - quotacheck:
      	Same story.
      
       - dquot reclaim:
      	Writes out dirty dquots on the LRU under memory pressure.  We might
      	want to look into doing more of this via xfsaild, but it's already
      	more optimal than the synchronous inode reclaim that writes each
      	buffer synchronously.
      
       - xfsaild:
      	This is the main beneficiary of the change.  By keeping a local list
      	of buffers to write we reduce latency of writing out buffers, and
      	more importably we can remove all the delwri list promotions which
      	were hitting the buffer cache hard under sustained metadata loads.
      
      The implementation is very straight forward - xfs_buf_delwri_queue now gets
      a new list_head pointer that it adds the delwri buffers to, and all callers
      need to eventually submit the list using xfs_buf_delwi_submit or
      xfs_buf_delwi_submit_nowait.  Buffers that already are on a delwri list are
      skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
      list.  The biggest change to pass down the buffer list was done to the AIL
      pushing. Now that we operate on buffers the trylock, push and pushbuf log
      item methods are merged into a single push routine, which tries to lock the
      item, and if possible add the buffer that needs writeback to the buffer list.
      This leads to much simpler code than the previous split but requires the
      individual IOP_PUSH instances to unlock and reacquire the AIL around calls
      to blocking routines.
      
      Given that xfsailds now also handle writing out buffers, the conditions for
      log forcing and the sleep times needed some small changes.  The most
      important one is that we consider an AIL busy as long we still have buffers
      to push, and the other one is that we do increment the pushed LSN for
      buffers that are under flushing at this moment, but still count them towards
      the stuck items for restart purposes.  Without this we could hammer on stuck
      items without ever forcing the log and not make progress under heavy random
      delete workloads on fast flash storage devices.
      
      [ Dave Chinner:
      	- rebase on previous patches.
      	- improved comments for XBF_DELWRI_Q handling
      	- fix XBF_ASYNC handling in queue submission (test 106 failure)
      	- rename delwri submit function buffer list parameters for clarity
      	- xfs_efd_item_push() should return XFS_ITEM_PINNED ]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      43ff2122
    • C
      xfs: do not write the buffer from xfs_iflush · 4c46819a
      Christoph Hellwig 提交于
      Instead of writing the buffer directly from inside xfs_iflush return it to
      the caller and let the caller decide what to do with the buffer.  Also
      remove the pincount check in xfs_iflush that all non-blocking callers already
      implement and the now unused flags parameter.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4c46819a
    • C
      xfs: remove log item from AIL in xfs_iflush after a shutdown · 32ce90a4
      Christoph Hellwig 提交于
      If a filesystem has been forced shutdown we are never going to write inodes
      to disk, which means the inode items will stay in the AIL until we free
      the inode. Currently that is not a problem, but a pending change requires us
      to empty the AIL before shutting down the filesystem. In that case leaving
      the inode in the AIL is lethal. Make sure to remove the log item from the AIL
      to allow emptying the AIL on shutdown filesystems.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      32ce90a4
  15. 14 3月, 2012 2 次提交
  16. 18 1月, 2012 4 次提交
    • C
      xfs: remove the i_size field in struct xfs_inode · ce7ae151
      Christoph Hellwig 提交于
      There is no fundamental need to keep an in-memory inode size copy in the XFS
      inode.  We already have the on-disk value in the dinode, and the separate
      in-memory copy that we need for regular files only in the XFS inode.
      
      Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the
      VFS inode i_size field for regular files.  Switch code that was directly
      accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where
      we are limited to regular files direct access of the VFS inode i_size field.
      
      This also allows dropping some fairly complicated code in the write path
      which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size
      that is getting updated inside ->write_end.
      
      Note that we do not bother resetting the VFS i_size when truncating a file
      that gets freed to zero as there is no point in doing so because the VFS inode
      is no longer in use at this point.  Just relax the assert in xfs_ifree to
      only check the on-disk size instead.
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ce7ae151
    • C
      xfs: replace i_pin_wait with a bit waitqueue · f392e631
      Christoph Hellwig 提交于
      Replace i_pin_wait, which is only used during synchronous inode flushing
      with a bit waitqueue.  This trades off a much smaller inode against
      slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit)
      bytes in the XFS inode.
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      f392e631
    • C
      xfs: replace i_flock with a sleeping bitlock · 474fce06
      Christoph Hellwig 提交于
      We almost never block on i_flock, the exception is synchronous inode
      flushing.  Instead of bloating the inode with a 16/24-byte completion
      that we abuse as a semaphore just implement it as a bitlock that uses
      a bit waitqueue for the rare sleeping path.  This primarily is a
      tradeoff between a much smaller inode and a faster non-blocking
      path vs faster wakeups, and we are much better off with the former.
      
      A small downside is that we will lose lockdep checking for i_flock, but
      given that it's always taken inside the ilock that should be acceptable.
      
      Note that for example the inode writeback locking is implemented in a
      very similar way.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      474fce06
    • C
      xfs: remove the if_ext_max field in struct xfs_ifork · 8096b1eb
      Christoph Hellwig 提交于
      We spent a lot of effort to maintain this field, but it always equals to the
      fork size divided by the constant size of an extent.  The prime use of it is
      to assert that the two stay in sync.  Just divide the fork size by the extent
      size in the few places that we actually use it and remove the overhead
      of maintaining it.  Also introduce a few helpers to consolidate the places
      where we actually care about the value.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      8096b1eb
  17. 14 1月, 2012 1 次提交
    • C
      xfs: remove xfs_itruncate_data · 673e8e59
      Christoph Hellwig 提交于
      This wrapper isn't overly useful, not to say rather confusing.
      
      Around the call to xfs_itruncate_extents it does:
      
       - add tracing
       - add a few asserts in debug builds
       - conditionally update the inode size in two places
       - log the inode
      
      Both the tracing and the inode logging can be moved to xfs_itruncate_extents
      as they are useful for the attribute fork as well - in fact the attr code
      already does an equivalent xfs_trans_log_inode call just after calling
      xfs_itruncate_extents.  The conditional size updates are a mess, and there
      was no reason to do them in two places anyway, as the first one was
      conditional on the inode having extents - but without extents we
      xfs_itruncate_extents would be a no-op and the placement wouldn't matter
      anyway.  Instead move the size assignments and the asserts that make sense
      to the callers that want it.
      
      As a side effect of this clean up xfs_setattr_size by introducing variables
      for the old and new inode size, and moving the size updates into a common
      place.
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      673e8e59
  18. 04 1月, 2012 1 次提交
  19. 30 11月, 2011 1 次提交