1. 16 11月, 2012 3 次提交
  2. 08 11月, 2012 1 次提交
    • D
      xfs: invalidate allocbt blocks moved to the free list · 4c05f9ad
      Dave Chinner 提交于
      When we free a block from the alloc btree tree, we move it to the
      freelist held in the AGFL and mark it busy in the busy extent tree.
      This typically happens when we merge btree blocks.
      
      Once the transaction is committed and checkpointed, the block can
      remain on the free list for an indefinite amount of time.  Now, this
      isn't the end of the world at this point - if the free list is
      shortened, the buffer is invalidated in the transaction that moves
      it back to free space. If the buffer is allocated as metadata from
      the free list, then all the modifications getted logged, and we have
      no issues, either. And if it gets allocated as userdata direct from
      the freelist, it gets invalidated and so will never get written.
      
      However, during the time it sits on the free list, pressure on the
      log can cause the AIL to be pushed and the buffer that covers the
      block gets pushed for write. IOWs, we end up writing a freed
      metadata block to disk. Again, this isn't the end of the world
      because we know from the above we are only writing to free space.
      
      The problem, however, is for validation callbacks. If the block was
      on old btree root block, then the level of the block is going to be
      higher than the current tree root, and so will fail validation.
      There may be other inconsistencies in the block as well, and
      currently we don't care because the block is in free space. Shutting
      down the filesystem because a freed block doesn't pass write
      validation, OTOH, is rather unfriendly.
      
      So, make sure we always invalidate buffers as they move from the
      free space trees to the free list so that we guarantee they never
      get written to disk while on the free list.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4c05f9ad
  3. 15 5月, 2012 4 次提交
  4. 13 7月, 2011 1 次提交
  5. 08 7月, 2011 1 次提交
  6. 25 5月, 2011 1 次提交
    • C
      xfs: do not discard alloc btree blocks · 55a7bc5a
      Christoph Hellwig 提交于
      Blocks for the allocation btree are allocated from and released to
      the AGFL, and thus frequently reused.  Even worse we do not have an
      easy way to avoid using an AGFL block when it is discarded due to
      the simple FILO list of free blocks, and thus can frequently stall
      on blocks that are currently undergoing a discard.
      
      Add a flag to the busy extent tracking structure to skip the discard
      for allocation btree blocks.  In normal operation these blocks are
      reused frequently enough that there is no need to discard them
      anyway, but if they spill over to the allocation btree as part of a
      balance we "leak" blocks that we would otherwise discard.  We could
      fix this by adding another flag and keeping these block in the
      rbtree even after they aren't busy any more so that we could discard
      them when they migrate out of the AGFL.  Given that this would cause
      significant overhead I don't think it's worthwile for now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      55a7bc5a
  7. 29 4月, 2011 2 次提交
    • C
      xfs: exact busy extent tracking · 97d3ac75
      Christoph Hellwig 提交于
      Update the extent tree in case we have to reuse a busy extent, so that it
      always is kept uptodate.  This is done by replacing the busy list searches
      with a new xfs_alloc_busy_reuse helper, which updates the busy extent tree
      in case of a reuse.  This allows us to allow reusing metadata extents
      unconditionally, and thus avoid log forces especially for allocation btree
      blocks.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      97d3ac75
    • C
      xfs: optimize AGFL refills · a870acd9
      Christoph Hellwig 提交于
      While we need to make sure we do not reuse busy extents, there is no need
      to force out busy extents when moving them between the AGFL and the
      freespace btree as we still take care of that when doing the real allocation.
      
      To avoid the log force when just moving extents from the different free
      space tracking structures, move the busy search out of
      xfs_alloc_get_freelist into the callers that need it, and move the busy
      list insert from xfs_free_ag_extent which is used both by AGFL refills
      and real allocation to xfs_free_extent, which is only used by the latter.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      a870acd9
  8. 19 10月, 2010 1 次提交
    • C
      xfs: remove the ->kill_root btree operation · c0e59e1a
      Christoph Hellwig 提交于
      The implementation os ->kill_root only differ by either simply
      zeroing out the now unused buffer in the btree cursor in the inode
      allocation btree or using xfs_btree_setbuf in the allocation btree.
      
      Initially both of them used xfs_btree_setbuf, but the use in the
      ialloc btree was removed early on because it interacted badly with
      xfs_trans_binval.
      
      In addition to zeroing out the buffer in the cursor xfs_btree_setbuf
      updates the bc_ra array in the btree cursor, and calls
      xfs_trans_brelse on the buffer previous occupying the slot.
      
      The bc_ra update should be done for the alloc btree updated too,
      although the lack of it does not cause serious problems.  The
      xfs_trans_brelse call on the other hand is effectively a no-op in
      the end - it keeps decrementing the bli_recur refcount until it hits
      zero, and then just skips out because the buffer will always be
      dirty at this point.  So removing it for the allocation btree is
      just fine.
      
      So unify the code and move it to xfs_btree.c.  While we're at it
      also replace the call to xfs_btree_setbuf with a NULL bp argument in
      xfs_btree_del_cursor with a direct call to xfs_trans_brelse given
      that the cursor is beeing freed just after this and the state
      updates are superflous.  After this xfs_btree_setbuf is only used
      with a non-NULL bp argument and can thus be simplified.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c0e59e1a
  9. 27 7月, 2010 2 次提交
  10. 24 5月, 2010 1 次提交
    • D
      xfs: Improve scalability of busy extent tracking · ed3b4d6c
      Dave Chinner 提交于
      When we free a metadata extent, we record it in the per-AG busy
      extent array so that it is not re-used before the freeing
      transaction hits the disk. This array is fixed size, so when it
      overflows we make further allocation transactions synchronous
      because we cannot track more freed extents until those transactions
      hit the disk and are completed. Under heavy mixed allocation and
      freeing workloads with large log buffers, we can overflow this array
      quite easily.
      
      Further, the array is sparsely populated, which means that inserts
      need to search for a free slot, and array searches often have to
      search many more slots that are actually used to check all the
      busy extents. Quite inefficient, really.
      
      To enable this aspect of extent freeing to scale better, we need
      a structure that can grow dynamically. While in other areas of
      XFS we have used radix trees, the extents being freed are at random
      locations on disk so are better suited to being indexed by an rbtree.
      
      So, use a per-AG rbtree indexed by block number to track busy
      extents.  This incures a memory allocation when marking an extent
      busy, but should not occur too often in low memory situations. This
      should scale to an arbitrary number of extents so should not be a
      limitation for features such as in-memory aggregation of
      transactions.
      
      However, there are still situations where we can't avoid allocating
      busy extents (such as allocation from the AGFL). To minimise the
      overhead of such occurences, we need to avoid doing a synchronous
      log force while holding the AGF locked to ensure that the previous
      transactions are safely on disk before we use the extent. We can do
      this by marking the transaction doing the allocation as synchronous
      rather issuing a log force.
      
      Because of the locking involved and the ordering of transactions,
      the synchronous transaction provides the same guarantees as a
      synchronous log force because it ensures that all the prior
      transactions are already on disk when the synchronous transaction
      hits the disk. i.e. it preserves the free->allocate order of the
      extent correctly in recovery.
      
      By doing this, we avoid holding the AGF locked while log writes are
      in progress, hence reducing the length of time the lock is held and
      therefore we increase the rate at which we can allocate and free
      from the allocation group, thereby increasing overall throughput.
      
      The only problem with this approach is that when a metadata buffer is
      marked stale (e.g. a directory block is removed), then buffer remains
      pinned and locked until the log goes to disk. The issue here is that
      if that stale buffer is reallocated in a subsequent transaction, the
      attempt to lock that buffer in the transaction will hang waiting
      the log to go to disk to unlock and unpin the buffer. Hence if
      someone tries to lock a pinned, stale, locked buffer we need to
      push on the log to get it unlocked ASAP. Effectively we are trading
      off a guaranteed log force for a much less common trigger for log
      force to occur.
      
      Ideally we should not reallocate busy extents. That is a much more
      complex fix to the problem as it involves direct intervention in the
      allocation btree searches in many places. This is left to a future
      set of modifications.
      
      Finally, now that we track busy extents in allocated memory, we
      don't need the descriptors in the transaction structure to point to
      them. We can replace the complex busy chunk infrastructure with a
      simple linked list of busy extents. This allows us to remove a large
      chunk of code, making the overall change a net reduction in code
      size.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      ed3b4d6c
  11. 16 1月, 2010 1 次提交
  12. 15 12月, 2009 1 次提交
    • C
      xfs: event tracing support · 0b1b213f
      Christoph Hellwig 提交于
      Convert the old xfs tracing support that could only be used with the
      out of tree kdb and xfsidbg patches to use the generic event tracer.
      
      To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
      all xfs trace channels by:
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/enable
      
      or alternatively enable single events by just doing the same in one
      event subdirectory, e.g.
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable
      
      or set more complex filters, etc. In Documentation/trace/events.txt
      all this is desctribed in more detail.  To reads the events do a
      
         cat /sys/kernel/debug/tracing/trace
      
      Compared to the last posting this patch converts the tracing mostly to
      the one tracepoint per callsite model that other users of the new
      tracing facility also employ.  This allows a very fine-grained control
      of the tracing, a cleaner output of the traces and also enables the
      perf tool to use each tracepoint as a virtual performance counter,
           allowing us to e.g. count how often certain workloads git various
           spots in XFS.  Take a look at
      
          http://lwn.net/Articles/346470/
      
      for some examples.
      
      Also the btree tracing isn't included at all yet, as it will require
      additional core tracing features not in mainline yet, I plan to
      deliver it later.
      
      And the really nice thing about this patch is that it actually removes
      many lines of code while adding this nice functionality:
      
       fs/xfs/Makefile                |    8
       fs/xfs/linux-2.6/xfs_acl.c     |    1
       fs/xfs/linux-2.6/xfs_aops.c    |   52 -
       fs/xfs/linux-2.6/xfs_aops.h    |    2
       fs/xfs/linux-2.6/xfs_buf.c     |  117 +--
       fs/xfs/linux-2.6/xfs_buf.h     |   33
       fs/xfs/linux-2.6/xfs_fs_subr.c |    3
       fs/xfs/linux-2.6/xfs_ioctl.c   |    1
       fs/xfs/linux-2.6/xfs_ioctl32.c |    1
       fs/xfs/linux-2.6/xfs_iops.c    |    1
       fs/xfs/linux-2.6/xfs_linux.h   |    1
       fs/xfs/linux-2.6/xfs_lrw.c     |   87 --
       fs/xfs/linux-2.6/xfs_lrw.h     |   45 -
       fs/xfs/linux-2.6/xfs_super.c   |  104 ---
       fs/xfs/linux-2.6/xfs_super.h   |    7
       fs/xfs/linux-2.6/xfs_sync.c    |    1
       fs/xfs/linux-2.6/xfs_trace.c   |   75 ++
       fs/xfs/linux-2.6/xfs_trace.h   | 1369 +++++++++++++++++++++++++++++++++++++++++
       fs/xfs/linux-2.6/xfs_vnode.h   |    4
       fs/xfs/quota/xfs_dquot.c       |  110 ---
       fs/xfs/quota/xfs_dquot.h       |   21
       fs/xfs/quota/xfs_qm.c          |   40 -
       fs/xfs/quota/xfs_qm_syscalls.c |    4
       fs/xfs/support/ktrace.c        |  323 ---------
       fs/xfs/support/ktrace.h        |   85 --
       fs/xfs/xfs.h                   |   16
       fs/xfs/xfs_ag.h                |   14
       fs/xfs/xfs_alloc.c             |  230 +-----
       fs/xfs/xfs_alloc.h             |   27
       fs/xfs/xfs_alloc_btree.c       |    1
       fs/xfs/xfs_attr.c              |  107 ---
       fs/xfs/xfs_attr.h              |   10
       fs/xfs/xfs_attr_leaf.c         |   14
       fs/xfs/xfs_attr_sf.h           |   40 -
       fs/xfs/xfs_bmap.c              |  507 +++------------
       fs/xfs/xfs_bmap.h              |   49 -
       fs/xfs/xfs_bmap_btree.c        |    6
       fs/xfs/xfs_btree.c             |    5
       fs/xfs/xfs_btree_trace.h       |   17
       fs/xfs/xfs_buf_item.c          |   87 --
       fs/xfs/xfs_buf_item.h          |   20
       fs/xfs/xfs_da_btree.c          |    3
       fs/xfs/xfs_da_btree.h          |    7
       fs/xfs/xfs_dfrag.c             |    2
       fs/xfs/xfs_dir2.c              |    8
       fs/xfs/xfs_dir2_block.c        |   20
       fs/xfs/xfs_dir2_leaf.c         |   21
       fs/xfs/xfs_dir2_node.c         |   27
       fs/xfs/xfs_dir2_sf.c           |   26
       fs/xfs/xfs_dir2_trace.c        |  216 ------
       fs/xfs/xfs_dir2_trace.h        |   72 --
       fs/xfs/xfs_filestream.c        |    8
       fs/xfs/xfs_fsops.c             |    2
       fs/xfs/xfs_iget.c              |  111 ---
       fs/xfs/xfs_inode.c             |   67 --
       fs/xfs/xfs_inode.h             |   76 --
       fs/xfs/xfs_inode_item.c        |    5
       fs/xfs/xfs_iomap.c             |   85 --
       fs/xfs/xfs_iomap.h             |    8
       fs/xfs/xfs_log.c               |  181 +----
       fs/xfs/xfs_log_priv.h          |   20
       fs/xfs/xfs_log_recover.c       |    1
       fs/xfs/xfs_mount.c             |    2
       fs/xfs/xfs_quota.h             |    8
       fs/xfs/xfs_rename.c            |    1
       fs/xfs/xfs_rtalloc.c           |    1
       fs/xfs/xfs_rw.c                |    3
       fs/xfs/xfs_trans.h             |   47 +
       fs/xfs/xfs_trans_buf.c         |   62 -
       fs/xfs/xfs_vnodeops.c          |    8
       70 files changed, 2151 insertions(+), 2592 deletions(-)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      0b1b213f
  13. 19 1月, 2009 1 次提交
  14. 16 1月, 2009 1 次提交
  15. 30 10月, 2008 19 次提交