1. 15 3月, 2013 1 次提交
  2. 08 3月, 2013 1 次提交
    • D
      xfs: rearrange some code in xfs_bmap for better locality · 9e5987a7
      Dave Chinner 提交于
      xfs_bmap.c is a big file, and some of the related code is spread all
      throughout the file requiring function prototypes for static
      function and jumping all through the file to follow a single call
      path. Rearrange the code so that:
      
      	a) related functionality is grouped together; and
      	b) functions are grouped in call dependency order
      
      While the diffstat is large, there are no code changes in the patch;
      it is just moving the functionality around and removing the function
      prototypes at the top of the file. The resulting layout of the code
      is as follows (top of file to bottom):
      
      	- miscellaneous helper functions
      	- extent tree block counting routines
      	- debug/sanity checking code
      	- bmap free list manipulation functions
      	- inode fork format manipulation functions
      	- internal/external extent tree seach functions
      	- extent tree manipulation functions used during allocation
      	- functions used during extent read/allocate/removal
      	  operations (i.e. xfs_bmapi_write, xfs_bmapi_read,
      	  xfs_bunmapi and xfs_getbmap)
      
      This means that following logic paths through the bmapi code is much
      simpler - most of the code relevant to a specific operation is now
      clustered together rather than spread all over the file....
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      9e5987a7
  3. 15 2月, 2013 1 次提交
    • D
      xfs: xfs_bmap_add_attrfork_local is too generic · 1e82379b
      Dave Chinner 提交于
      When we are converting local data to an extent format as a result of
      adding an attribute, the type of data contained in the local fork
      determines the behaviour that needs to occur.
      
      xfs_bmap_add_attrfork_local() already handles the directory data
      case specially by using S_ISDIR() and calling out to
      xfs_dir2_sf_to_block(), but with verifiers we now need to handle
      each different type of metadata specially and different metadata
      formats require different verifiers (and eventually block header
      initialisation).
      
      There is only a single place that we add and attribute fork to
      the inode, but that is in the attribute code and it knows nothing
      about the specific contents of the data fork. It is only the case of
      local data that is the issue here, so adding code to hadnle this
      case in the attribute specific code is wrong. Hence we are really
      stuck trying to detect the data fork contents in
      xfs_bmap_add_attrfork_local() and performing the correct callout
      there.
      
      Luckily the current cases can be determined by S_IS* macros, and we
      can push the work off to data specific callouts, but each of those
      callouts does a lot of work in common with
      xfs_bmap_local_to_extents(). The only reason that this fails for
      symlinks right now is is that xfs_bmap_local_to_extents() assumes
      the data fork contains extent data, and so attaches a a bmap extent
      data verifier to the buffer and simply copies the data fork
      information straight into it.
      
      To fix this, allow us to pass a "formatting" callback into
      xfs_bmap_local_to_extents() which is responsible for setting the
      buffer type, initialising it and copying the data fork contents over
      to the new buffer. This allows callers to specify how they want to
      format the new buffer (which is necessary for the upcoming CRC
      enabled metadata blocks) and hence make xfs_bmap_local_to_extents()
      useful for any type of data fork content.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: Mark Tinguely <tinguely@sgi.com> 
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1e82379b
  4. 29 1月, 2013 1 次提交
  5. 18 1月, 2013 1 次提交
  6. 04 1月, 2013 1 次提交
  7. 16 11月, 2012 3 次提交
    • D
      xfs: convert buffer verifiers to an ops structure. · 1813dd64
      Dave Chinner 提交于
      To separate the verifiers from iodone functions and associate read
      and write verifiers at the same time, introduce a buffer verifier
      operations structure to the xfs_buf.
      
      This avoids the need for assigning the write verifier, clearing the
      iodone function and re-running ioend processing in the read
      verifier, and gets rid of the nasty "b_pre_io" name for the write
      verifier function pointer. If we ever need to, it will also be
      easier to add further content specific callbacks to a buffer with an
      ops structure in place.
      
      We also avoid needing to export verifier functions, instead we
      can simply export the ops structures for those that are needed
      outside the function they are defined in.
      
      This patch also fixes a directory block readahead verifier issue
      it exposed.
      
      This patch also adds ops callbacks to the inode/alloc btree blocks
      initialised by growfs. These will need more work before they will
      work with CRCs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1813dd64
    • D
      xfs: connect up write verifiers to new buffers · b0f539de
      Dave Chinner 提交于
      Metadata buffers that are read from disk have write verifiers
      already attached to them, but newly allocated buffers do not. Add
      appropriate write verifiers to all new metadata buffers.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b0f539de
    • D
      xfs: verify btree blocks as they are read from disk · 3d3e6f64
      Dave Chinner 提交于
      Add an btree block verify callback function and pass it into the
      buffer read functions. Because each different btree block type
      requires different verification, add a function to the ops structure
      that is called from the generic code.
      
      Also, propagate the verification callback functions through the
      readahead functions, and into the external bmap and bulkstat inode
      readahead code that uses the generic btree buffer read functions.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3d3e6f64
  8. 15 11月, 2012 1 次提交
    • D
      xfs: remove xfs_flush_pages · 4bc1ea6b
      Dave Chinner 提交于
      It is a complex wrapper around VFS functions, but there are VFS
      functions that provide exactly the same functionality. Call the VFS
      functions directly and remove the unnecessary indirection and
      complexity.
      
      We don't need to care about clearing the XFS_ITRUNCATED flag, as
      that is done during .writepages. Hence is cleared by the VFS
      writeback path if there is anything to write back during the flush.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAndrew Dahl <adahl@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4bc1ea6b
  9. 09 11月, 2012 3 次提交
    • D
      xfs: move allocation stack switch up to xfs_bmapi_allocate · 1f3c785c
      Dave Chinner 提交于
      Switching stacks are xfs_alloc_vextent can cause deadlocks when we
      run out of worker threads on the allocation workqueue. This can
      occur because xfs_bmap_btalloc can make multiple calls to
      xfs_alloc_vextent() and even if xfs_alloc_vextent() fails it can
      return with the AGF locked in the current allocation transaction.
      
      If we then need to make another allocation, and all the allocation
      worker contexts are exhausted because the are blocked waiting for
      the AGF lock, holder of the AGF cannot get it's xfs-alloc_vextent
      work completed to release the AGF.  Hence allocation effectively
      deadlocks.
      
      To avoid this, move the stack switch one layer up to
      xfs_bmapi_allocate() so that all of the allocation attempts in a
      single switched stack transaction occur in a single worker context.
      This avoids the problem of an allocation being blocked waiting for
      a worker thread whilst holding the AGF.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1f3c785c
    • D
      xfs: introduce XFS_BMAPI_STACK_SWITCH · 326c0355
      Dave Chinner 提交于
      Certain allocation paths through xfs_bmapi_write() are in situations
      where we have limited stack available. These are almost always in
      the buffered IO writeback path when convertion delayed allocation
      extents to real extents.
      
      The current stack switch occurs for userdata allocations, which
      means we also do stack switches for preallocation, direct IO and
      unwritten extent conversion, even those these call chains have never
      been implicated in a stack overrun.
      
      Hence, let's target just the single stack overun offended for stack
      switches. To do that, introduce a XFS_BMAPI_STACK_SWITCH flag that
      the caller can pass xfs_bmapi_write() to indicate it should switch
      stacks if it needs to do allocation.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      326c0355
    • M
      xfs: zero allocation_args on the kernel stack · 408cc4e9
      Mark Tinguely 提交于
      Zero the kernel stack space that makes up the xfs_alloc_arg structures.
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      408cc4e9
  10. 19 10月, 2012 3 次提交
    • D
      xfs: move allocation stack switch up to xfs_bmapi_allocate · e04426b9
      Dave Chinner 提交于
      Switching stacks are xfs_alloc_vextent can cause deadlocks when we
      run out of worker threads on the allocation workqueue. This can
      occur because xfs_bmap_btalloc can make multiple calls to
      xfs_alloc_vextent() and even if xfs_alloc_vextent() fails it can
      return with the AGF locked in the current allocation transaction.
      
      If we then need to make another allocation, and all the allocation
      worker contexts are exhausted because the are blocked waiting for
      the AGF lock, holder of the AGF cannot get it's xfs-alloc_vextent
      work completed to release the AGF.  Hence allocation effectively
      deadlocks.
      
      To avoid this, move the stack switch one layer up to
      xfs_bmapi_allocate() so that all of the allocation attempts in a
      single switched stack transaction occur in a single worker context.
      This avoids the problem of an allocation being blocked waiting for
      a worker thread whilst holding the AGF.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      e04426b9
    • D
      xfs: introduce XFS_BMAPI_STACK_SWITCH · 2455881c
      Dave Chinner 提交于
      Certain allocation paths through xfs_bmapi_write() are in situations
      where we have limited stack available. These are almost always in
      the buffered IO writeback path when convertion delayed allocation
      extents to real extents.
      
      The current stack switch occurs for userdata allocations, which
      means we also do stack switches for preallocation, direct IO and
      unwritten extent conversion, even those these call chains have never
      been implicated in a stack overrun.
      
      Hence, let's target just the single stack overun offended for stack
      switches. To do that, introduce a XFS_BMAPI_STACK_SWITCH flag that
      the caller can pass xfs_bmapi_write() to indicate it should switch
      stacks if it needs to do allocation.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      2455881c
    • M
      xfs: zero allocation_args on the kernel stack · a0041684
      Mark Tinguely 提交于
      Zero the kernel stack space that makes up the xfs_alloc_arg structures.
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a0041684
  11. 15 6月, 2012 1 次提交
  12. 21 5月, 2012 1 次提交
    • D
      xfs: fix delalloc quota accounting on failure · ea562ed6
      Dave Chinner 提交于
      xfstest 270 was causing quota reservations way beyond what was sane
      (ten to hundreds of TB) for a 4GB filesystem. There's a sign problem
      in the error handling path of xfs_bmapi_reserve_delalloc() because
      xfs_trans_unreserve_quota_nblks() simple negates the value passed -
      which doesn't work for an unsigned variable. This causes
      reservations of close to 2^32 block instead of removing a
      reservation of a handful of blocks.
      
      Fix the same problem in the other xfs_trans_unreserve_quota_nblks()
      callers where unsigned integer variables are used, too.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ea562ed6
  13. 15 5月, 2012 3 次提交
  14. 23 3月, 2012 1 次提交
    • K
      xfs: fix deadlock in xfs_rtfree_extent · 5575acc7
      Kamal Dasu 提交于
      To fix the deadlock caused by repeatedly calling xfs_rtfree_extent
      
       - removed xfs_ilock() and xfs_trans_ijoin() from xfs_rtfree_extent(),
         instead added asserts that the inode is locked and has an inode_item
         attached to it.
       - in xfs_bunmapi() when dealing with an inode with the rt flag
         call xfs_ilock() and xfs_trans_ijoin() so that the
         reference count is bumped on the inode and attached it to the
         transaction before calling into xfs_bmap_del_extent, similar to
         what we do in xfs_bmap_rtalloc.
      Signed-off-by: NKamal Dasu <kdasu.kdev@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5575acc7
  15. 16 3月, 2012 1 次提交
  16. 18 1月, 2012 2 次提交
    • C
      xfs: remove the i_size field in struct xfs_inode · ce7ae151
      Christoph Hellwig 提交于
      There is no fundamental need to keep an in-memory inode size copy in the XFS
      inode.  We already have the on-disk value in the dinode, and the separate
      in-memory copy that we need for regular files only in the XFS inode.
      
      Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the
      VFS inode i_size field for regular files.  Switch code that was directly
      accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where
      we are limited to regular files direct access of the VFS inode i_size field.
      
      This also allows dropping some fairly complicated code in the write path
      which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size
      that is getting updated inside ->write_end.
      
      Note that we do not bother resetting the VFS i_size when truncating a file
      that gets freed to zero as there is no point in doing so because the VFS inode
      is no longer in use at this point.  Just relax the assert in xfs_ifree to
      only check the on-disk size instead.
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ce7ae151
    • C
      xfs: remove the if_ext_max field in struct xfs_ifork · 8096b1eb
      Christoph Hellwig 提交于
      We spent a lot of effort to maintain this field, but it always equals to the
      fork size divided by the constant size of an extent.  The prime use of it is
      to assert that the two stay in sync.  Just divide the fork size by the extent
      size in the few places that we actually use it and remove the overhead
      of maintaining it.  Also introduce a few helpers to consolidate the places
      where we actually care about the value.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      8096b1eb
  17. 03 12月, 2011 1 次提交
    • D
      xfs: fix allocation length overflow in xfs_bmapi_write() · a99ebf43
      Dave Chinner 提交于
      When testing the new xfstests --large-fs option that does very large
      file preallocations, this assert was tripped deep in
      xfs_alloc_vextent():
      
      XFS: Assertion failed: args->minlen <= args->maxlen, file: fs/xfs/xfs_alloc.c, line: 2239
      
      The allocation was trying to allocate a zero length extent because
      the lower 32 bits of the allocation length was zero. The remaining
      length of the allocation to be done was an exact multiple of 2^32 -
      the first case I saw was at 496TB remaining to be allocated.
      
      This turns out to be an overflow when converting the allocation
      length (a 64 bit quantity) into the extent length to allocate (a 32
      bit quantity), and it requires the length to be allocated an exact
      multiple of 2^32 blocks to trip the assert.
      
      Fix it by limiting the extent lenth to allocate to MAXEXTLEN.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      a99ebf43
  18. 12 10月, 2011 14 次提交