1. 07 11月, 2013 1 次提交
  2. 24 10月, 2013 3 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
    • D
      xfs: create a shared header file for format-related information · 70a9883c
      Dave Chinner 提交于
      All of the buffer operations structures are needed to be exported
      for xfs_db, so move them all to a common location rather than
      spreading them all over the place. They are verifying the on-disk
      format, so while xfs_format.h might be a good place, it is not part
      of the on disk format.
      
      Hence we need to create a new header file that we centralise these
      related definitions. Start by moving the bffer operations
      structures, and then also move all the other definitions that have
      crept into xfs_log_format.h and xfs_format.h as there was no other
      shared header file to put them in.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      70a9883c
  3. 13 8月, 2013 1 次提交
  4. 21 5月, 2013 1 次提交
    • J
      xfs: Avoid pathological backwards allocation · 211d022c
      Jan Kara 提交于
      Writing a large file using direct IO in 16 MB chunks sometimes results
      in a pathological allocation pattern where 16 MB chunks of large free
      extent are allocated to a file in a reversed order. So extents of a file
      look for example as:
      
       ext logical physical expected length flags
         0        0        13          4550656
         1  4550656 188136807   4550668 12562432
         2 17113088 200699240 200699238 622592
         3 17735680 182046055 201321831   4096
         4 17739776 182041959 182050150   4096
         5 17743872 182037863 182046054   4096
         6 17747968 182033767 182041958   4096
         7 17752064 182029671 182037862   4096
      ...
      6757 45400064 154381644 154389835   4096
      6758 45404160 154377548 154385739   4096
      6759 45408256 252951571 154381643  73728 eof
      
      This happens because XFS_ALLOCTYPE_THIS_BNO allocation fails (the last
      extent in the file cannot be further extended) so we fall back to
      XFS_ALLOCTYPE_NEAR_BNO allocation which picks end of a large free
      extent as the best place to continue the file. Since the chunk at the
      end of the free extent again cannot be further extended, this behavior
      repeats until the whole free extent is consumed in a reversed order.
      
      For data allocations this backward allocation isn't beneficial so make
      xfs_alloc_compute_diff() pick start of a free extent instead of its end
      for them. That avoids the backward allocation pattern.
      
      See thread at http://oss.sgi.com/archives/xfs/2013-03/msg00144.html for
      more details about the reproduction case and why this solution was
      chosen.
      
      Based on idea by Dave Chinner <dchinner@redhat.com>.
      
      CC: Dave Chinner <dchinner@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      211d022c
  5. 28 4月, 2013 1 次提交
    • D
      xfs: buffer type overruns blf_flags field · 61fe135c
      Dave Chinner 提交于
      The buffer type passed to log recvoery in the buffer log item
      overruns the blf_flags field. I had assumed that flags field was a
      32 bit value, and it turns out it is a unisgned short. Therefore
      having 19 flags doesn't really work.
      
      Convert the buffer type field to numeric value, and use the top 5
      bits of the flags field for it. We currently have 17 types of
      buffers, so using 5 bits gives us plenty of room for expansion in
      future....
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      61fe135c
  6. 22 4月, 2013 2 次提交
    • C
      xfs: add CRC checks to the AGFL · 77c95bba
      Christoph Hellwig 提交于
      Add CRC checks, location information and a magic number to the AGFL.
      Previously the AGFL was just a block containing nothing but the
      free block pointers.  The new AGFL has a real header with the usual
      boilerplate instead, so that we can verify it's not corrupted and
      written into the right place.
      
      [dchinner@redhat.com] Added LSN field, reworked significantly to fit
      into new verifier structure and growfs structure, enabled full
      verifier functionality now there is a header to verify and we can
      guarantee an initialised AGFL.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      77c95bba
    • D
      xfs: add CRC checks to the AGF · 4e0e6040
      Dave Chinner 提交于
      The AGF already has some self identifying fields (e.g. the sequence
      number) so we only need to add the uuid to it to identify the
      filesystem it belongs to. The location is fixed based on the
      sequence number, so there's no need to add a block number, either.
      
      Hence the only additional fields are the CRC and LSN fields. These
      are unlogged, so place some space between the end of the logged
      fields and them so that future expansion of the AGF for logged
      fields can be placed adjacent to the existing logged fields and
      hence not complicate the field-derived range based logging we
      currently have.
      
      Based originally on a patch from myself, modified further by
      Christoph Hellwig and then modified again to fit into the
      verifier structure with additional fields by myself. The multiple
      signed-off-by tags indicate the age and history of this patch.
      Signed-off-by: NDave Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4e0e6040
  7. 08 3月, 2013 1 次提交
  8. 04 1月, 2013 1 次提交
  9. 16 11月, 2012 6 次提交
    • D
      xfs: convert buffer verifiers to an ops structure. · 1813dd64
      Dave Chinner 提交于
      To separate the verifiers from iodone functions and associate read
      and write verifiers at the same time, introduce a buffer verifier
      operations structure to the xfs_buf.
      
      This avoids the need for assigning the write verifier, clearing the
      iodone function and re-running ioend processing in the read
      verifier, and gets rid of the nasty "b_pre_io" name for the write
      verifier function pointer. If we ever need to, it will also be
      easier to add further content specific callbacks to a buffer with an
      ops structure in place.
      
      We also avoid needing to export verifier functions, instead we
      can simply export the ops structures for those that are needed
      outside the function they are defined in.
      
      This patch also fixes a directory block readahead verifier issue
      it exposed.
      
      This patch also adds ops callbacks to the inode/alloc btree blocks
      initialised by growfs. These will need more work before they will
      work with CRCs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1813dd64
    • D
      xfs: connect up write verifiers to new buffers · b0f539de
      Dave Chinner 提交于
      Metadata buffers that are read from disk have write verifiers
      already attached to them, but newly allocated buffers do not. Add
      appropriate write verifiers to all new metadata buffers.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b0f539de
    • D
      xfs: add pre-write metadata buffer verifier callbacks · 612cfbfe
      Dave Chinner 提交于
      These verifiers are essentially the same code as the read verifiers,
      but do not require ioend processing. Hence factor the read verifier
      functions and add a new write verifier wrapper that is used as the
      callback.
      
      This is done as one large patch for all verifiers rather than one
      patch per verifier as the change is largely mechanical. This
      includes hooking up the write verifier via the read verifier
      function.
      
      Hooking up the write verifier for buffers obtained via
      xfs_trans_get_buf() will be done in a separate patch as that touches
      code in many different places rather than just the verifier
      functions.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      612cfbfe
    • D
      xfs: verify AGFL blocks as they are read from disk · bb80c6d7
      Dave Chinner 提交于
      Add an AGFL block verify callback function and pass it into the
      buffer read functions.
      
      While this commit adds verification code to the AGFL, it cannot be
      used reliably until the CRC format change comes along as mkfs does
      not initialise the full AGFL. Hence it can be full of garbage at the
      first mount and will fail verification right now. CRC enabled
      filesystems won't have this problem, so leave the code that has
      already been written ifdef'd out until the proper time.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      bb80c6d7
    • D
      xfs: verify AGF blocks as they are read from disk · 5d5f527d
      Dave Chinner 提交于
      Add an AGF block verify callback function and pass it into the
      buffer read functions. This replaces the existing verification that
      is done after the read completes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5d5f527d
    • D
      xfs: make buffer read verication an IO completion function · c3f8fc73
      Dave Chinner 提交于
      Add a verifier function callback capability to the buffer read
      interfaces.  This will be used by the callers to supply a function
      that verifies the contents of the buffer when it is read from disk.
      This patch does not provide callback functions, but simply modifies
      the interfaces to allow them to be called.
      
      The reason for adding this to the read interfaces is that it is very
      difficult to tell fom the outside is a buffer was just read from
      disk or whether we just pulled it out of cache. Supplying a callbck
      allows the buffer cache to use it's internal knowledge of the buffer
      to execute it only when the buffer is read from disk.
      
      It is intended that the verifier functions will mark the buffer with
      an EFSCORRUPTED error when verification fails. This allows the
      reading context to distinguish a verification error from an IO
      error, and potentially take further actions on the buffer (e.g.
      attempt repair) based on the error reported.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c3f8fc73
  10. 09 11月, 2012 3 次提交
    • D
      xfs: move allocation stack switch up to xfs_bmapi_allocate · 1f3c785c
      Dave Chinner 提交于
      Switching stacks are xfs_alloc_vextent can cause deadlocks when we
      run out of worker threads on the allocation workqueue. This can
      occur because xfs_bmap_btalloc can make multiple calls to
      xfs_alloc_vextent() and even if xfs_alloc_vextent() fails it can
      return with the AGF locked in the current allocation transaction.
      
      If we then need to make another allocation, and all the allocation
      worker contexts are exhausted because the are blocked waiting for
      the AGF lock, holder of the AGF cannot get it's xfs-alloc_vextent
      work completed to release the AGF.  Hence allocation effectively
      deadlocks.
      
      To avoid this, move the stack switch one layer up to
      xfs_bmapi_allocate() so that all of the allocation attempts in a
      single switched stack transaction occur in a single worker context.
      This avoids the problem of an allocation being blocked waiting for
      a worker thread whilst holding the AGF.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1f3c785c
    • D
      xfs: introduce XFS_BMAPI_STACK_SWITCH · 326c0355
      Dave Chinner 提交于
      Certain allocation paths through xfs_bmapi_write() are in situations
      where we have limited stack available. These are almost always in
      the buffered IO writeback path when convertion delayed allocation
      extents to real extents.
      
      The current stack switch occurs for userdata allocations, which
      means we also do stack switches for preallocation, direct IO and
      unwritten extent conversion, even those these call chains have never
      been implicated in a stack overrun.
      
      Hence, let's target just the single stack overun offended for stack
      switches. To do that, introduce a XFS_BMAPI_STACK_SWITCH flag that
      the caller can pass xfs_bmapi_write() to indicate it should switch
      stacks if it needs to do allocation.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      326c0355
    • M
      xfs: zero allocation_args on the kernel stack · 408cc4e9
      Mark Tinguely 提交于
      Zero the kernel stack space that makes up the xfs_alloc_arg structures.
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      408cc4e9
  11. 19 10月, 2012 3 次提交
    • D
      xfs: move allocation stack switch up to xfs_bmapi_allocate · e04426b9
      Dave Chinner 提交于
      Switching stacks are xfs_alloc_vextent can cause deadlocks when we
      run out of worker threads on the allocation workqueue. This can
      occur because xfs_bmap_btalloc can make multiple calls to
      xfs_alloc_vextent() and even if xfs_alloc_vextent() fails it can
      return with the AGF locked in the current allocation transaction.
      
      If we then need to make another allocation, and all the allocation
      worker contexts are exhausted because the are blocked waiting for
      the AGF lock, holder of the AGF cannot get it's xfs-alloc_vextent
      work completed to release the AGF.  Hence allocation effectively
      deadlocks.
      
      To avoid this, move the stack switch one layer up to
      xfs_bmapi_allocate() so that all of the allocation attempts in a
      single switched stack transaction occur in a single worker context.
      This avoids the problem of an allocation being blocked waiting for
      a worker thread whilst holding the AGF.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      e04426b9
    • D
      xfs: introduce XFS_BMAPI_STACK_SWITCH · 2455881c
      Dave Chinner 提交于
      Certain allocation paths through xfs_bmapi_write() are in situations
      where we have limited stack available. These are almost always in
      the buffered IO writeback path when convertion delayed allocation
      extents to real extents.
      
      The current stack switch occurs for userdata allocations, which
      means we also do stack switches for preallocation, direct IO and
      unwritten extent conversion, even those these call chains have never
      been implicated in a stack overrun.
      
      Hence, let's target just the single stack overun offended for stack
      switches. To do that, introduce a XFS_BMAPI_STACK_SWITCH flag that
      the caller can pass xfs_bmapi_write() to indicate it should switch
      stacks if it needs to do allocation.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      2455881c
    • M
      xfs: zero allocation_args on the kernel stack · a0041684
      Mark Tinguely 提交于
      Zero the kernel stack space that makes up the xfs_alloc_arg structures.
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a0041684
  12. 14 7月, 2012 4 次提交
  13. 22 6月, 2012 2 次提交
  14. 21 6月, 2012 1 次提交
    • J
      xfs: fix debug_object WARN at xfs_alloc_vextent() · 3b876c8f
      Jeff Liu 提交于
      Fengguang reports:
      
      [  780.529603] XFS (vdd): Ending clean mount
      [  781.454590] ODEBUG: object is on stack, but not annotated
      [  781.455433] ------------[ cut here ]------------
      [  781.455433] WARNING: at /c/kernel-tests/sound/lib/debugobjects.c:301 __debug_object_init+0x173/0x1f1()
      [  781.455433] Hardware name: Bochs
      [  781.455433] Modules linked in:
      [  781.455433] Pid: 26910, comm: kworker/0:2 Not tainted 3.4.0+ #51
      [  781.455433] Call Trace:
      [  781.455433]  [<ffffffff8106bc84>] warn_slowpath_common+0x83/0x9b
      [  781.455433]  [<ffffffff8106bcb6>] warn_slowpath_null+0x1a/0x1c
      [  781.455433]  [<ffffffff814919a5>] __debug_object_init+0x173/0x1f1
      [  781.455433]  [<ffffffff81491c65>] debug_object_init+0x14/0x16
      [  781.455433]  [<ffffffff8108842a>] __init_work+0x20/0x22
      [  781.455433]  [<ffffffff8134ea56>] xfs_alloc_vextent+0x6c/0xd5
      
      Use INIT_WORK_ONSTACK in xfs_alloc_vextent instead of INIT_WORK.
      Reported-by: NWu Fengguang <wfg@linux.intel.com>
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3b876c8f
  15. 15 6月, 2012 1 次提交
    • J
      xfs: fix debug_object WARN at xfs_alloc_vextent() · 0f2cf9d3
      Jeff Liu 提交于
      Fengguang reports:
      
      [  780.529603] XFS (vdd): Ending clean mount
      [  781.454590] ODEBUG: object is on stack, but not annotated
      [  781.455433] ------------[ cut here ]------------
      [  781.455433] WARNING: at /c/kernel-tests/sound/lib/debugobjects.c:301 __debug_object_init+0x173/0x1f1()
      [  781.455433] Hardware name: Bochs
      [  781.455433] Modules linked in:
      [  781.455433] Pid: 26910, comm: kworker/0:2 Not tainted 3.4.0+ #51
      [  781.455433] Call Trace:
      [  781.455433]  [<ffffffff8106bc84>] warn_slowpath_common+0x83/0x9b
      [  781.455433]  [<ffffffff8106bcb6>] warn_slowpath_null+0x1a/0x1c
      [  781.455433]  [<ffffffff814919a5>] __debug_object_init+0x173/0x1f1
      [  781.455433]  [<ffffffff81491c65>] debug_object_init+0x14/0x16
      [  781.455433]  [<ffffffff8108842a>] __init_work+0x20/0x22
      [  781.455433]  [<ffffffff8134ea56>] xfs_alloc_vextent+0x6c/0xd5
      
      Use INIT_WORK_ONSTACK in xfs_alloc_vextent instead of INIT_WORK.
      Reported-by: NWu Fengguang <wfg@linux.intel.com>
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      0f2cf9d3
  16. 15 5月, 2012 4 次提交
  17. 28 3月, 2012 1 次提交
    • D
      xfs: fix fstrim offset calculations · a66d6363
      Dave Chinner 提交于
      xfs_ioc_fstrim() doesn't treat the incoming offset and length
      correctly. It treats them as a filesystem block address, rather than
      a disk address. This is wrong because the range passed in is a
      linear representation, while the filesystem block address notation
      is a sparse representation. Hence we cannot convert the range direct
      to filesystem block units and then use that for calculating the
      range to trim.
      
      While this sounds dangerous, the problem is limited to calculating
      what AGs need to be trimmed. The code that calcuates the actual
      ranges to trim gets the right result (i.e. only ever discards free
      space), even though it uses the wrong ranges to limit what is
      trimmed. Hence this is not a bug that endangers user data.
      
      Fix this by treating the range as a disk address range and use the
      appropriate functions to convert the range into the desired formats
      for calculations.
      
      Further, fix the first free extent lookup (the longest) to actually
      find the largest free extent. Currently this lookup uses a <=
      lookup, which results in finding the extent to the left of the
      largest because we can never get an exact match on the largest
      extent. This is due to the fact that while we know it's size, we
      don't know it's location and so the exact match fails and we move
      one record to the left to get the next largest extent. Instead, use
      a >= search so that the lookup returns the largest extent regardless
      of the fact we don't get an exact match on it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a66d6363
  18. 23 3月, 2012 1 次提交
    • D
      xfs: introduce an allocation workqueue · c999a223
      Dave Chinner 提交于
      We currently have significant issues with the amount of stack that
      allocation in XFS uses, especially in the writeback path. We can
      easily consume 4k of stack between mapping the page, manipulating
      the bmap btree and allocating blocks from the free list. Not to
      mention btree block readahead and other functionality that issues IO
      in the allocation path.
      
      As a result, we can no longer fit allocation in the writeback path
      in the stack space provided on x86_64. To alleviate this problem,
      introduce an allocation workqueue and move all allocations to a
      seperate context. This can be easily added as an interposing layer
      into xfs_alloc_vextent(), which takes a single argument structure
      and does not return until the allocation is complete or has failed.
      
      To do this, add a work structure and a completion to the allocation
      args structure. This allows xfs_alloc_vextent to queue the args onto
      the workqueue and wait for it to be completed by the worker. This
      can be done completely transparently to the caller.
      
      The worker function needs to ensure that it sets and clears the
      PF_TRANS flag appropriately as it is being run in an active
      transaction context. Work can also be queued in a memory reclaim
      context, so a rescuer is needed for the workqueue.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c999a223
  19. 12 10月, 2011 1 次提交
  20. 26 7月, 2011 1 次提交
  21. 09 7月, 2011 1 次提交