1. 07 12月, 2011 1 次提交
    • C
      xfs: fix the logspace waiting algorithm · 9f9c19ec
      Christoph Hellwig 提交于
      Apply the scheme used in log_regrant_write_log_space to wake up any other
      threads waiting for log space before the newly added one to
      log_regrant_write_log_space as well, and factor the code into readable
      helpers.  For each of the queues we have add two helpers:
      
       - one to try to wake up all waiting threads.  This helper will also be
         usable by xfs_log_move_tail once we remove the current opportunistic
         wakeups in it.
       - one to sleep on t_wait until enough log space is available, loosely
         modelled after Linux waitqueues.
       
      And use them to reimplement the guts of log_regrant_write_log_space and
      log_regrant_write_log_space.  These two function now use one and the same
      algorithm for waiting on log space instead of subtly different ones before,
      with an option to completely unify them in the near future.
      
      Also move the filesystem shutdown handling to the common caller given
      that we had to touch it anyway.
      
      Based on hard debugging and an earlier patch from
      Chandra Seetharaman <sekharan@us.ibm.com>.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Tested-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      9f9c19ec
  2. 19 10月, 2011 1 次提交
  3. 12 10月, 2011 2 次提交
    • C
      xfs: optimize fsync on directories · 1da2f2db
      Christoph Hellwig 提交于
      Directories are only updated transactionally, which means fsync only
      needs to flush the log the inode is currently dirty, but not bother
      with checking for dirty data, non-transactional updates, and most
      importanly doesn't have to flush disk caches except as part of a
      transaction commit.
      
      While the first two optimizations can't easily be measured, the
      latter actually makes a difference when doing lots of fsync that do
      not actually have to commit the inode, e.g. because an earlier fsync
      already pushed the log far enough.
      
      The new xfs_dir_fsync is identical to xfs_nfs_commit_metadata except
      for the prototype, but I'm not sure creating a common helper for the
      two is worth it given how simple the functions are.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      
      1da2f2db
    • C
      xfs: call xfs_buf_delwri_queue directly · 61551f1e
      Christoph Hellwig 提交于
      Unify the ways we add buffers to the delwri queue by always calling
      xfs_buf_delwri_queue directly.  The xfs_bdwrite functions is removed and
      opencoded in its callers, and the two places setting XBF_DELWRI while a
      buffer is locked and expecting xfs_buf_unlock to pick it up are converted
      to call xfs_buf_delwri_queue directly, too.  Also replace the
      XFS_BUF_UNDELAYWRITE macro with direct calls to xfs_buf_delwri_dequeue
      to make the explicit queuing/dequeuing more obvious.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      
      61551f1e
  4. 13 8月, 2011 1 次提交
    • C
      xfs: remove subdirectories · c59d87c4
      Christoph Hellwig 提交于
      Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
      annoying subdirectories in the XFS source code.  Besides the large
      amount of file rename the only changes are to the Makefile, a few
      files including headers with the subdirectory prefix, and the binary
      sysctl compat code that includes a header under fs/xfs/ from
      kernel/.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c59d87c4
  5. 26 7月, 2011 1 次提交
  6. 21 7月, 2011 1 次提交
  7. 08 7月, 2011 3 次提交
    • C
      xfs: clean up buffer locking helpers · 0c842ad4
      Christoph Hellwig 提交于
      Rename xfs_buf_cond_lock and reverse it's return value to fit most other
      trylock operations in the Kernel and XFS (with the exception of down_trylock,
      after which xfs_buf_cond_lock was modelled), and replace xfs_buf_lock_val
      with an xfs_buf_islocked for use in asserts, or and opencoded variant in
      tracing.  remove the XFS_BUF_* wrappers for all the locking helpers.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      0c842ad4
    • C
      xfs: split xfs_itruncate_finish · 8f04c47a
      Christoph Hellwig 提交于
      Split the guts of xfs_itruncate_finish that loop over the existing extents
      and calls xfs_bunmapi on them into a new helper, xfs_itruncate_externs.
      Make xfs_attr_inactive call it directly instead of xfs_itruncate_finish,
      which allows to simplify the latter a lot, by only letting it deal with
      the data fork.  As a result xfs_itruncate_finish is renamed to
      xfs_itruncate_data to make its use case more obvious.
      
      Also remove the sync parameter from xfs_itruncate_data, which has been
      unessecary since the introduction of the busy extent list in 2002, and
      completely dead code since 2003 when the XFS_BMAPI_ASYNC parameter was
      made a no-op.
      
      I can't actually see why the xfs_attr_inactive needs to set the transaction
      sync, but let's keep this patch simple and without changes in behaviour.
      
      Also avoid passing a useless argument to xfs_isize_check, and make it
      private to xfs_inode.c.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      8f04c47a
    • C
      xfs: kill xfs_itruncate_start · 857b9778
      Christoph Hellwig 提交于
      xfs_itruncate_start is a rather length wrapper that evaluates to a call
      to xfs_ioend_wait and xfs_tosspages, and only has two callers.
      
      Instead of using the complicated checks left over from IRIX where we
      can to truncate the pagecache just call xfs_tosspages
      (aka truncate_inode_pages) directly as we want to get rid of all data
      after i_size, and truncate_inode_pages handles incorrect alignments
      and too large offsets just fine.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      857b9778
  8. 29 4月, 2011 3 次提交
    • C
      xfs: fix compiler warning in xfs_trace.h · 1a18a294
      Christoph Hellwig 提交于
      xfs_fsblock_t may be a 32-bit type on if XFS_BIG_BLKNOS is not set,
      make sure to cast a value of this type to an unsigned long long
      before using the ll printk qualifier.
      Reported-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      1a18a294
    • C
      xfs: exact busy extent tracking · 97d3ac75
      Christoph Hellwig 提交于
      Update the extent tree in case we have to reuse a busy extent, so that it
      always is kept uptodate.  This is done by replacing the busy list searches
      with a new xfs_alloc_busy_reuse helper, which updates the busy extent tree
      in case of a reuse.  This allows us to allow reusing metadata extents
      unconditionally, and thus avoid log forces especially for allocation btree
      blocks.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      97d3ac75
    • C
      xfs: do not immediately reuse busy extent ranges · e26f0501
      Christoph Hellwig 提交于
      Every time we reallocate a busy extent, we cause a synchronous log force
      to occur to ensure the freeing transaction is on disk before we continue
      and use the newly allocated extent.  This is extremely sub-optimal as we
      have to mark every transaction with blocks that get reused as synchronous.
      
      Instead of searching the busy extent list after deciding on the extent to
      allocate, check each candidate extent during the allocation decisions as
      to whether they are in the busy list.  If they are in the busy list, we
      trim the busy range out of the extent we have found and determine if that
      trimmed range is still OK for allocation. In many cases, this check can
      be incorporated into the allocation extent alignment code which already
      does trimming of the found extent before determining if it is a valid
      candidate for allocation.
      
      Based on earlier patches from Dave Chinner.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      e26f0501
  9. 12 1月, 2011 1 次提交
  10. 21 12月, 2010 4 次提交
    • D
      xfs: introduce new locks for the log grant ticket wait queues · 3f16b985
      Dave Chinner 提交于
      The log grant ticket wait queues are currently protected by the log
      grant lock.  However, the queues are functionally independent from
      each other, and operations on them only require serialisation
      against other queue operations now that all of the other log
      variables they use are atomic values.
      
      Hence, we can make them independent of the grant lock by introducing
      new locks just to protect the lists operations. because the lists
      are independent, we can use a lock per list and ensure that reserve
      and write head queuing do not contend.
      
      To ensure forced shutdowns work correctly in conjunction with the
      new fast paths, ensure that we check whether the log has been shut
      down in the grant functions once we hold the relevant spin locks but
      before we go to sleep. This is needed to co-ordinate correctly with
      the wakeups that are issued on the ticket queues so we don't leave
      any processes sleeping on the queues during a shutdown.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      3f16b985
    • D
      xfs: convert l_tail_lsn to an atomic variable. · 1c3cb9ec
      Dave Chinner 提交于
      log->l_tail_lsn is currently protected by the log grant lock. The
      lock is only needed for serialising readers against writers, so we
      don't really need the lock if we make the l_tail_lsn variable an
      atomic. Converting the l_tail_lsn variable to an atomic64_t means we
      can start to peel back the grant lock from various operations.
      
      Also, provide functions to safely crack an atomic LSN variable into
      it's component pieces and to recombined the components into an
      atomic variable. Use them where appropriate.
      
      This also removes the need for explicitly holding a spinlock to read
      the l_tail_lsn on 32 bit platforms.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      
      1c3cb9ec
    • D
      xfs: combine grant heads into a single 64 bit integer · a69ed03c
      Dave Chinner 提交于
      Prepare for switching the grant heads to atomic variables by
      combining the two 32 bit values that make up the grant head into a
      single 64 bit variable.  Provide wrapper functions to combine and
      split the grant heads appropriately for calculations and use them as
      necessary.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      a69ed03c
    • D
      xfs: convert log grant ticket queues to list heads · 10547941
      Dave Chinner 提交于
      The grant write and reserve queues use a roll-your-own double linked
      list, so convert it to a standard list_head structure and convert
      all the list traversals to use list_for_each_entry(). We can also
      get rid of the XLOG_TIC_IN_Q flag as we can use the list_empty()
      check to tell if the ticket is in a list or not.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      10547941
  11. 17 12月, 2010 2 次提交
  12. 19 10月, 2010 2 次提交
  13. 10 9月, 2010 1 次提交
  14. 10 8月, 2010 1 次提交
  15. 27 7月, 2010 5 次提交
    • T
      xfs: Fix build when CONFIG_XFS_POSIX_ACL=n · 0f1a932f
      Tony Luck 提交于
      When CONFIG_XFS_POSIX_ACL is not set "xfs_check_acl" is #defined
      to NULL - which breaks the code attempting to add a tracepoint
      on this function.
      
      Only define the tracepoint when the function exists.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0f1a932f
    • C
      xfs: split xfs_itrace_entry · cca28fb8
      Christoph Hellwig 提交于
      Replace the xfs_itrace_entry catchall with specific trace points.  For
      most simple callers we now use the simple inode class, which used to
      be the iget class, but add more details tracing for namespace events,
      which now includes the name of the directory entries manipulated.
      
      Remove the xfs_inactive trace point, which is a duplicate of the clear_inode
      one, and the xfs_change_file_space trace point, which is immediately
      followed by the more specific alloc/free space trace points.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      cca28fb8
    • C
      xfs: some iget tracing cleanups / fixes · d2e078c3
      Christoph Hellwig 提交于
      The xfs_iget_alloc/found tracepoints are a bit misnamed and misplaced.
      Rename them to xfs_iget_hit/xfs_iget_miss and move them to the beggining
      of the xfs_iget_cache_hit/miss functions.  Add a new xfs_iget_reclaim_fail
      tracepoint for the case where we fail to re-initialize a VFS inode,
      and add a second instance of the xfs_iget_skip tracepoint for the case
      of a failed igrab() call.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      d2e078c3
    • C
      xfs: simplify xfs_vm_writepage · 20cb52eb
      Christoph Hellwig 提交于
      The writepage implementation in XFS still tries to deal with dirty but
      unmapped buffers which used to caused by writes through shared mmaps.  Since
      the introduction of ->page_mkwrite these can't happen anymore, so remove the
      code dealing with them.
      
      Note that the all_bh variable which causes us to start I/O on all buffers on
      the pages was controlled by the count of unmapped buffers, which also
      included those not actually dirty.  It's now unconditionally initialized to
      0 but set to 1 for the case of small file size extensions.  It probably can
      be removed entirely, but that's left for another patch.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      20cb52eb
    • C
      xfs: simplify buffer pinning · 4d16e924
      Christoph Hellwig 提交于
      Get rid of the xfs_buf_pin/xfs_buf_unpin/xfs_buf_ispin helpers and opencode
      them in their only callers, just like we did for the inode pinning a while
      ago.  Also remove duplicate trace points - the bufitem tracepoints cover
      all the information that is present in a buffer tracepoint.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      4d16e924
  16. 20 7月, 2010 1 次提交
    • D
      xfs: track AGs with reclaimable inodes in per-ag radix tree · 16fd5367
      Dave Chinner 提交于
      https://bugzilla.kernel.org/show_bug.cgi?id=16348
      
      When the filesystem grows to a large number of allocation groups,
      the summing of recalimable inodes gets expensive. In many cases,
      most AGs won't have any reclaimable inodes and so we are wasting CPU
      time aggregating over these AGs. This is particularly important for
      the inode shrinker that gets called frequently under memory
      pressure.
      
      To avoid the overhead, track AGs with reclaimable inodes in the
      per-ag radix tree so that we can find all the AGs with reclaimable
      inodes via a simple gang tag lookup. This involves setting the tag
      when the first reclaimable inode is tracked in the AG, and removing
      the tag when the last reclaimable inode is removed from the tree.
      Then the summation process becomes a loop walking the radix tree
      summing AGs with the reclaim tag set.
      
      This significantly reduces the overhead of scanning - a 6400 AG
      filesystea now only uses about 25% of a cpu in kswapd while slab
      reclaim progresses instead of being permanently stuck at 100% CPU
      and making little progress. Clean filesystems filesystems will see
      no overhead and the overhead only increases linearly with the number
      of dirty AGs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      16fd5367
  17. 29 5月, 2010 1 次提交
  18. 24 5月, 2010 1 次提交
    • D
      xfs: Improve scalability of busy extent tracking · ed3b4d6c
      Dave Chinner 提交于
      When we free a metadata extent, we record it in the per-AG busy
      extent array so that it is not re-used before the freeing
      transaction hits the disk. This array is fixed size, so when it
      overflows we make further allocation transactions synchronous
      because we cannot track more freed extents until those transactions
      hit the disk and are completed. Under heavy mixed allocation and
      freeing workloads with large log buffers, we can overflow this array
      quite easily.
      
      Further, the array is sparsely populated, which means that inserts
      need to search for a free slot, and array searches often have to
      search many more slots that are actually used to check all the
      busy extents. Quite inefficient, really.
      
      To enable this aspect of extent freeing to scale better, we need
      a structure that can grow dynamically. While in other areas of
      XFS we have used radix trees, the extents being freed are at random
      locations on disk so are better suited to being indexed by an rbtree.
      
      So, use a per-AG rbtree indexed by block number to track busy
      extents.  This incures a memory allocation when marking an extent
      busy, but should not occur too often in low memory situations. This
      should scale to an arbitrary number of extents so should not be a
      limitation for features such as in-memory aggregation of
      transactions.
      
      However, there are still situations where we can't avoid allocating
      busy extents (such as allocation from the AGFL). To minimise the
      overhead of such occurences, we need to avoid doing a synchronous
      log force while holding the AGF locked to ensure that the previous
      transactions are safely on disk before we use the extent. We can do
      this by marking the transaction doing the allocation as synchronous
      rather issuing a log force.
      
      Because of the locking involved and the ordering of transactions,
      the synchronous transaction provides the same guarantees as a
      synchronous log force because it ensures that all the prior
      transactions are already on disk when the synchronous transaction
      hits the disk. i.e. it preserves the free->allocate order of the
      extent correctly in recovery.
      
      By doing this, we avoid holding the AGF locked while log writes are
      in progress, hence reducing the length of time the lock is held and
      therefore we increase the rate at which we can allocate and free
      from the allocation group, thereby increasing overall throughput.
      
      The only problem with this approach is that when a metadata buffer is
      marked stale (e.g. a directory block is removed), then buffer remains
      pinned and locked until the log goes to disk. The issue here is that
      if that stale buffer is reallocated in a subsequent transaction, the
      attempt to lock that buffer in the transaction will hang waiting
      the log to go to disk to unlock and unpin the buffer. Hence if
      someone tries to lock a pinned, stale, locked buffer we need to
      push on the log to get it unlocked ASAP. Effectively we are trading
      off a guaranteed log force for a much less common trigger for log
      force to occur.
      
      Ideally we should not reallocate busy extents. That is a much more
      complex fix to the problem as it involves direct intervention in the
      allocation btree searches in many places. This is left to a future
      set of modifications.
      
      Finally, now that we track busy extents in allocated memory, we
      don't need the descriptors in the transaction structure to point to
      them. We can replace the complex busy chunk infrastructure with a
      simple linked list of busy extents. This allows us to remove a large
      chunk of code, making the overall change a net reduction in code
      size.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      ed3b4d6c
  19. 19 5月, 2010 4 次提交
  20. 02 3月, 2010 2 次提交
  21. 02 2月, 2010 1 次提交
    • D
      xfs: Don't issue buffer IO direct from AIL push V2 · d808f617
      Dave Chinner 提交于
      All buffers logged into the AIL are marked as delayed write.
      When the AIL needs to push the buffer out, it issues an async write of the
      buffer. This means that IO patterns are dependent on the order of
      buffers in the AIL.
      
      Instead of flushing the buffer, promote the buffer in the delayed
      write list so that the next time the xfsbufd is run the buffer will
      be flushed by the xfsbufd. Return the state to the xfsaild that the
      buffer was promoted so that the xfsaild knows that it needs to cause
      the xfsbufd to run to flush the buffers that were promoted.
      
      Using the xfsbufd for issuing the IO allows us to dispatch all
      buffer IO from the one queue. This means that we can make much more
      enlightened decisions on what order to flush buffers to disk as
      we don't have multiple places issuing IO. Optimisations to xfsbufd
      will be in a future patch.
      
      Version 2
      - kill XFS_ITEM_FLUSHING as it is now unused.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      d808f617
  22. 16 1月, 2010 1 次提交