1. 17 12月, 2013 2 次提交
    • D
      xfs: swalloc doesn't align allocations properly · 33177f05
      Dave Chinner 提交于
      When swalloc is specified as a mount option, allocations are
      supposed to be aligned to the stripe width rather than the stripe
      unit of the underlying filesystem. However, it does not do this.
      
      What the implementation does is round up the allocation size to a
      stripe width, hence ensuring that all allocations span a full stripe
      width. It does not, however, ensure that that allocation is aligned
      to a stripe width, and hence the allocations can span multiple
      underlying stripes and so still see RMW cycles for things like
      direct IO on MD RAID.
      
      So, if the swalloc mount option is set, change the allocation
      alignment in xfs_bmap_btalloc() to use the stripe width rather than
      the stripe unit.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      33177f05
    • D
      xfs: align initial file allocations correctly · 6e708bcf
      Dave Chinner 提交于
      The function xfs_bmap_isaeof() is used to indicate that an
      allocation is occurring at or past the end of file, and as such
      should be aligned to the underlying storage geometry if possible.
      
      Commit 27a3f8f2 ("xfs: introduce xfs_bmap_last_extent") changed the
      behaviour of this function for empty files - it turned off
      allocation alignment for this case accidentally. Hence large initial
      allocations from direct IO are not getting correctly aligned to the
      underlying geometry, and that is cause write performance to drop in
      alignment sensitive configurations.
      
      Fix it by considering allocation into empty files as requiring
      aligned allocation again.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit f9b395a8)
      6e708bcf
  2. 12 12月, 2013 1 次提交
    • D
      xfs: align initial file allocations correctly · f9b395a8
      Dave Chinner 提交于
      The function xfs_bmap_isaeof() is used to indicate that an
      allocation is occurring at or past the end of file, and as such
      should be aligned to the underlying storage geometry if possible.
      
      Commit 27a3f8f2 ("xfs: introduce xfs_bmap_last_extent") changed the
      behaviour of this function for empty files - it turned off
      allocation alignment for this case accidentally. Hence large initial
      allocations from direct IO are not getting correctly aligned to the
      underlying geometry, and that is cause write performance to drop in
      alignment sensitive configurations.
      
      Fix it by considering allocation into empty files as requiring
      aligned allocation again.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      f9b395a8
  3. 18 11月, 2013 1 次提交
  4. 24 10月, 2013 4 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
    • D
      xfs: unify directory/attribute format definitions · 57062787
      Dave Chinner 提交于
      The on-disk format definitions for the directory and attribute
      structures are spread across 3 header files right now, only one of
      which is dedicated to defining on-disk structures and their
      manipulation (xfs_dir2_format.h). Pull all the format definitions
      into a single header file - xfs_da_format.h - and switch all the
      code over to point at that.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      57062787
    • D
      xfs: create a shared header file for format-related information · 70a9883c
      Dave Chinner 提交于
      All of the buffer operations structures are needed to be exported
      for xfs_db, so move them all to a common location rather than
      spreading them all over the place. They are verifying the on-disk
      format, so while xfs_format.h might be a good place, it is not part
      of the on disk format.
      
      Hence we need to create a new header file that we centralise these
      related definitions. Start by moving the bffer operations
      structures, and then also move all the other definitions that have
      crept into xfs_log_format.h and xfs_format.h as there was no other
      shared header file to put them in.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      70a9883c
  5. 18 10月, 2013 1 次提交
  6. 10 9月, 2013 1 次提交
  7. 21 8月, 2013 2 次提交
  8. 13 8月, 2013 6 次提交
  9. 10 7月, 2013 1 次提交
    • D
      xfs: remove local fork format handling from xfs_bmapi_write() · f3508bcd
      Dave Chinner 提交于
      The conversion from local format to extent format requires
      interpretation of the data in the fork being converted, so it cannot
      be done in a generic way. It is up to the caller to convert the fork
      format to extent format before calling into xfs_bmapi_write() so
      format conversion can be done correctly.
      
      The code in xfs_bmapi_write() to convert the format is used
      implicitly by the attribute and directory code, but they
      specifically zero the fork size so that the conversion does not do
      any allocation or manipulation. Move this conversion into the
      shortform to leaf functions for the dir/attr code so the conversions
      are explicitly controlled by all callers.
      
      Now we can remove the conversion code in xfs_bmapi_write.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      f3508bcd
  10. 28 4月, 2013 2 次提交
  11. 22 4月, 2013 2 次提交
    • D
      xfs: split out symlink code into it's own file. · 19de7351
      Dave Chinner 提交于
      The symlink code is about to get more complicated when CRCs are
      added for remote symlink blocks. The symlink management code is
      mostly self contained, so move it to it's own files so that all the
      new code and the existing symlink code will not be intermingled
      with other unrelated code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      19de7351
    • C
      xfs: add support for large btree blocks · ee1a47ab
      Christoph Hellwig 提交于
      Add support for larger btree blocks that contains a CRC32C checksum,
      a filesystem uuid and block number for detecting filesystem
      consistency and out of place writes.
      
      [dchinner@redhat.com] Also include an owner field to allow reverse
      mappings to be implemented for improved repairability and a LSN
      field to so that log recovery can easily determine the last
      modification that made it to disk for each buffer.
      
      [dchinner@redhat.com] Add buffer log format flags to indicate the
      type of buffer to recovery so that we don't have to do blind magic
      number tests to determine what the buffer is.
      
      [dchinner@redhat.com] Modified to fit into the verifier structure.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ee1a47ab
  12. 15 3月, 2013 1 次提交
  13. 08 3月, 2013 1 次提交
    • D
      xfs: rearrange some code in xfs_bmap for better locality · 9e5987a7
      Dave Chinner 提交于
      xfs_bmap.c is a big file, and some of the related code is spread all
      throughout the file requiring function prototypes for static
      function and jumping all through the file to follow a single call
      path. Rearrange the code so that:
      
      	a) related functionality is grouped together; and
      	b) functions are grouped in call dependency order
      
      While the diffstat is large, there are no code changes in the patch;
      it is just moving the functionality around and removing the function
      prototypes at the top of the file. The resulting layout of the code
      is as follows (top of file to bottom):
      
      	- miscellaneous helper functions
      	- extent tree block counting routines
      	- debug/sanity checking code
      	- bmap free list manipulation functions
      	- inode fork format manipulation functions
      	- internal/external extent tree seach functions
      	- extent tree manipulation functions used during allocation
      	- functions used during extent read/allocate/removal
      	  operations (i.e. xfs_bmapi_write, xfs_bmapi_read,
      	  xfs_bunmapi and xfs_getbmap)
      
      This means that following logic paths through the bmapi code is much
      simpler - most of the code relevant to a specific operation is now
      clustered together rather than spread all over the file....
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      9e5987a7
  14. 15 2月, 2013 1 次提交
    • D
      xfs: xfs_bmap_add_attrfork_local is too generic · 1e82379b
      Dave Chinner 提交于
      When we are converting local data to an extent format as a result of
      adding an attribute, the type of data contained in the local fork
      determines the behaviour that needs to occur.
      
      xfs_bmap_add_attrfork_local() already handles the directory data
      case specially by using S_ISDIR() and calling out to
      xfs_dir2_sf_to_block(), but with verifiers we now need to handle
      each different type of metadata specially and different metadata
      formats require different verifiers (and eventually block header
      initialisation).
      
      There is only a single place that we add and attribute fork to
      the inode, but that is in the attribute code and it knows nothing
      about the specific contents of the data fork. It is only the case of
      local data that is the issue here, so adding code to hadnle this
      case in the attribute specific code is wrong. Hence we are really
      stuck trying to detect the data fork contents in
      xfs_bmap_add_attrfork_local() and performing the correct callout
      there.
      
      Luckily the current cases can be determined by S_IS* macros, and we
      can push the work off to data specific callouts, but each of those
      callouts does a lot of work in common with
      xfs_bmap_local_to_extents(). The only reason that this fails for
      symlinks right now is is that xfs_bmap_local_to_extents() assumes
      the data fork contains extent data, and so attaches a a bmap extent
      data verifier to the buffer and simply copies the data fork
      information straight into it.
      
      To fix this, allow us to pass a "formatting" callback into
      xfs_bmap_local_to_extents() which is responsible for setting the
      buffer type, initialising it and copying the data fork contents over
      to the new buffer. This allows callers to specify how they want to
      format the new buffer (which is necessary for the upcoming CRC
      enabled metadata blocks) and hence make xfs_bmap_local_to_extents()
      useful for any type of data fork content.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: Mark Tinguely <tinguely@sgi.com> 
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1e82379b
  15. 29 1月, 2013 1 次提交
  16. 18 1月, 2013 1 次提交
  17. 04 1月, 2013 1 次提交
  18. 16 11月, 2012 3 次提交
    • D
      xfs: convert buffer verifiers to an ops structure. · 1813dd64
      Dave Chinner 提交于
      To separate the verifiers from iodone functions and associate read
      and write verifiers at the same time, introduce a buffer verifier
      operations structure to the xfs_buf.
      
      This avoids the need for assigning the write verifier, clearing the
      iodone function and re-running ioend processing in the read
      verifier, and gets rid of the nasty "b_pre_io" name for the write
      verifier function pointer. If we ever need to, it will also be
      easier to add further content specific callbacks to a buffer with an
      ops structure in place.
      
      We also avoid needing to export verifier functions, instead we
      can simply export the ops structures for those that are needed
      outside the function they are defined in.
      
      This patch also fixes a directory block readahead verifier issue
      it exposed.
      
      This patch also adds ops callbacks to the inode/alloc btree blocks
      initialised by growfs. These will need more work before they will
      work with CRCs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1813dd64
    • D
      xfs: connect up write verifiers to new buffers · b0f539de
      Dave Chinner 提交于
      Metadata buffers that are read from disk have write verifiers
      already attached to them, but newly allocated buffers do not. Add
      appropriate write verifiers to all new metadata buffers.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b0f539de
    • D
      xfs: verify btree blocks as they are read from disk · 3d3e6f64
      Dave Chinner 提交于
      Add an btree block verify callback function and pass it into the
      buffer read functions. Because each different btree block type
      requires different verification, add a function to the ops structure
      that is called from the generic code.
      
      Also, propagate the verification callback functions through the
      readahead functions, and into the external bmap and bulkstat inode
      readahead code that uses the generic btree buffer read functions.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3d3e6f64
  19. 15 11月, 2012 1 次提交
    • D
      xfs: remove xfs_flush_pages · 4bc1ea6b
      Dave Chinner 提交于
      It is a complex wrapper around VFS functions, but there are VFS
      functions that provide exactly the same functionality. Call the VFS
      functions directly and remove the unnecessary indirection and
      complexity.
      
      We don't need to care about clearing the XFS_ITRUNCATED flag, as
      that is done during .writepages. Hence is cleared by the VFS
      writeback path if there is anything to write back during the flush.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAndrew Dahl <adahl@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4bc1ea6b
  20. 09 11月, 2012 3 次提交
    • D
      xfs: move allocation stack switch up to xfs_bmapi_allocate · 1f3c785c
      Dave Chinner 提交于
      Switching stacks are xfs_alloc_vextent can cause deadlocks when we
      run out of worker threads on the allocation workqueue. This can
      occur because xfs_bmap_btalloc can make multiple calls to
      xfs_alloc_vextent() and even if xfs_alloc_vextent() fails it can
      return with the AGF locked in the current allocation transaction.
      
      If we then need to make another allocation, and all the allocation
      worker contexts are exhausted because the are blocked waiting for
      the AGF lock, holder of the AGF cannot get it's xfs-alloc_vextent
      work completed to release the AGF.  Hence allocation effectively
      deadlocks.
      
      To avoid this, move the stack switch one layer up to
      xfs_bmapi_allocate() so that all of the allocation attempts in a
      single switched stack transaction occur in a single worker context.
      This avoids the problem of an allocation being blocked waiting for
      a worker thread whilst holding the AGF.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1f3c785c
    • D
      xfs: introduce XFS_BMAPI_STACK_SWITCH · 326c0355
      Dave Chinner 提交于
      Certain allocation paths through xfs_bmapi_write() are in situations
      where we have limited stack available. These are almost always in
      the buffered IO writeback path when convertion delayed allocation
      extents to real extents.
      
      The current stack switch occurs for userdata allocations, which
      means we also do stack switches for preallocation, direct IO and
      unwritten extent conversion, even those these call chains have never
      been implicated in a stack overrun.
      
      Hence, let's target just the single stack overun offended for stack
      switches. To do that, introduce a XFS_BMAPI_STACK_SWITCH flag that
      the caller can pass xfs_bmapi_write() to indicate it should switch
      stacks if it needs to do allocation.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      326c0355
    • M
      xfs: zero allocation_args on the kernel stack · 408cc4e9
      Mark Tinguely 提交于
      Zero the kernel stack space that makes up the xfs_alloc_arg structures.
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      408cc4e9
  21. 19 10月, 2012 3 次提交
    • D
      xfs: move allocation stack switch up to xfs_bmapi_allocate · e04426b9
      Dave Chinner 提交于
      Switching stacks are xfs_alloc_vextent can cause deadlocks when we
      run out of worker threads on the allocation workqueue. This can
      occur because xfs_bmap_btalloc can make multiple calls to
      xfs_alloc_vextent() and even if xfs_alloc_vextent() fails it can
      return with the AGF locked in the current allocation transaction.
      
      If we then need to make another allocation, and all the allocation
      worker contexts are exhausted because the are blocked waiting for
      the AGF lock, holder of the AGF cannot get it's xfs-alloc_vextent
      work completed to release the AGF.  Hence allocation effectively
      deadlocks.
      
      To avoid this, move the stack switch one layer up to
      xfs_bmapi_allocate() so that all of the allocation attempts in a
      single switched stack transaction occur in a single worker context.
      This avoids the problem of an allocation being blocked waiting for
      a worker thread whilst holding the AGF.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      e04426b9
    • D
      xfs: introduce XFS_BMAPI_STACK_SWITCH · 2455881c
      Dave Chinner 提交于
      Certain allocation paths through xfs_bmapi_write() are in situations
      where we have limited stack available. These are almost always in
      the buffered IO writeback path when convertion delayed allocation
      extents to real extents.
      
      The current stack switch occurs for userdata allocations, which
      means we also do stack switches for preallocation, direct IO and
      unwritten extent conversion, even those these call chains have never
      been implicated in a stack overrun.
      
      Hence, let's target just the single stack overun offended for stack
      switches. To do that, introduce a XFS_BMAPI_STACK_SWITCH flag that
      the caller can pass xfs_bmapi_write() to indicate it should switch
      stacks if it needs to do allocation.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      2455881c
    • M
      xfs: zero allocation_args on the kernel stack · a0041684
      Mark Tinguely 提交于
      Zero the kernel stack space that makes up the xfs_alloc_arg structures.
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a0041684
  22. 15 6月, 2012 1 次提交