1. 15 5月, 2012 1 次提交
  2. 14 3月, 2012 4 次提交
  3. 23 2月, 2012 1 次提交
  4. 18 1月, 2012 3 次提交
  5. 09 12月, 2011 1 次提交
  6. 09 11月, 2011 1 次提交
  7. 12 10月, 2011 2 次提交
  8. 13 7月, 2011 1 次提交
  9. 08 7月, 2011 1 次提交
  10. 07 7月, 2011 1 次提交
    • D
      xfs: unpin stale inodes directly in IOP_COMMITTED · 1316d4da
      Dave Chinner 提交于
      When inodes are marked stale in a transaction, they are treated
      specially when the inode log item is being inserted into the AIL.
      It tries to avoid moving the log item forward in the AIL due to a
      race condition with the writing the underlying buffer back to disk.
      The was "fixed" in commit de25c181 ("xfs: avoid moving stale inodes
      in the AIL").
      
      To avoid moving the item forward, we return a LSN smaller than the
      commit_lsn of the completing transaction, thereby trying to trick
      the commit code into not moving the inode forward at all. I'm not
      sure this ever worked as intended - it assumes the inode is already
      in the AIL, but I don't think the returned LSN would have been small
      enough to prevent moving the inode. It appears that the reason it
      worked is that the lower LSN of the inodes meant they were inserted
      into the AIL and flushed before the inode buffer (which was moved to
      the commit_lsn of the transaction).
      
      The big problem is that with delayed logging, the returning of the
      different LSN means insertion takes the slow, non-bulk path.  Worse
      yet is that insertion is to a position -before- the commit_lsn so it
      is doing a AIL traversal on every insertion, and has to walk over
      all the items that have already been inserted into the AIL. It's
      expensive.
      
      To compound the matter further, with delayed logging inodes are
      likely to go from clean to stale in a single checkpoint, which means
      they aren't even in the AIL at all when we come across them at AIL
      insertion time. Hence these were all getting inserted into the AIL
      when they simply do not need to be as inodes marked XFS_ISTALE are
      never written back.
      
      Transactional/recovery integrity is maintained in this case by the
      other items in the unlink transaction that were modified (e.g. the
      AGI btree blocks) and committed in the same checkpoint.
      
      So to fix this, simply unpin the stale inodes directly in
      xfs_inode_item_committed() and return -1 to indicate that the AIL
      insertion code does not need to do any further processing of these
      inodes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      1316d4da
  11. 29 4月, 2011 1 次提交
  12. 08 4月, 2011 1 次提交
    • D
      xfs: fix extent format buffer allocation size · e828776a
      Dave Chinner 提交于
      When formatting an inode item, we have to allocate a separate buffer
      to hold extents when there are delayed allocation extents on the
      inode and it is in extent format. The allocation size is derived
      from the in-core data fork representation, which accounts for
      delayed allocation extents, while the on-disk representation does
      not contain any delalloc extents.
      
      As a result of this mismatch, the allocated buffer can be far larger
      than needed to hold the real extent list which, due to the fact the
      inode is in extent format, is limited to the size of the literal
      area of the inode. However, we can have thousands of delalloc
      extents, resulting in an allocation size orders of magnitude larger
      than is needed to hold all the real extents.
      
      Fix this by limiting the size of the buffer being allocated to the
      size of the literal area of the inodes in the filesystem (i.e. the
      maximum size an inode fork can grow to).
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      e828776a
  13. 26 3月, 2011 1 次提交
    • D
      xfs: introduce inode cluster buffer trylocks for xfs_iflush · 1bfd8d04
      Dave Chinner 提交于
      There is an ABBA deadlock between synchronous inode flushing in
      xfs_reclaim_inode and xfs_icluster_free. xfs_icluster_free locks the
      buffer, then takes inode ilocks, whilst synchronous reclaim takes
      the ilock followed by the buffer lock in xfs_iflush().
      
      To avoid this deadlock, separate the inode cluster buffer locking
      semantics from the synchronous inode flush semantics, allowing
      callers to attempt to lock the buffer but still issue synchronous IO
      if it can get the buffer. This requires xfs_iflush() calls that
      currently use non-blocking semantics to pass SYNC_TRYLOCK rather
      than 0 as the flags parameter.
      
      This allows xfs_reclaim_inode to avoid the deadlock on the buffer
      lock and detect the failure so that it can drop the inode ilock and
      restart the reclaim attempt on the inode. This allows
      xfs_ifree_cluster to obtain the inode lock, mark the inode stale and
      release it and hence defuse the deadlock situation. It also has the
      pleasant side effect of avoiding IO in xfs_reclaim_inode when it
      tries to next reclaim the inode as it is now marked stale.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      1bfd8d04
  14. 20 12月, 2010 1 次提交
    • D
      xfs: remove all the inodes on a buffer from the AIL in bulk · 30136832
      Dave Chinner 提交于
      When inode buffer IO completes, usually all of the inodes are removed from the
      AIL. This involves processing them one at a time and taking the AIL lock once
      for every inode. When all CPUs are processing inode IO completions, this causes
      excessive amount sof contention on the AIL lock.
      
      Instead, change the way we process inode IO completion in the buffer
      IO done callback. Allow the inode IO done callback to walk the list
      of IO done callbacks and pull all the inodes off the buffer in one
      go and then process them as a batch.
      
      Once all the inodes for removal are collected, take the AIL lock
      once and do a bulk removal operation to minimise traffic on the AIL
      lock.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      30136832
  15. 01 12月, 2010 1 次提交
    • D
      xfs: avoid moving stale inodes in the AIL · de25c181
      Dave Chinner 提交于
      When an inode has been marked stale because the cluster is being
      freed, we don't want to (re-)insert this inode into the AIL. There
      is a race condition where the cluster buffer may be unpinned before
      the inode is inserted into the AIL during transaction committed
      processing. If the buffer is unpinned before the inode item has been
      committed and inserted, then it is possible for the buffer to be
      released and hence processthe stale inode callbacks before the inode
      is inserted into the AIL.
      
      In this case, we then insert a clean, stale inode into the AIL which
      will never get removed by an IO completion. It will, however, get
      reclaimed and that triggers an assert in xfs_inode_free()
      complaining about freeing an inode still in the AIL.
      
      This race can be avoided by not moving stale inodes forward in the AIL
      during transaction commit completion processing. This closes the
      race condition by ensuring we never insert clean stale inodes into
      the AIL. It is safe to do this because a dirty stale inode, by
      definition, must already be in the AIL.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      de25c181
  16. 19 10月, 2010 1 次提交
    • D
      xfs: don't use vfs writeback for pure metadata modifications · dcd79a14
      Dave Chinner 提交于
      Under heavy multi-way parallel create workloads, the VFS struggles
      to write back all the inodes that have been changed in age order.
      The bdi flusher thread becomes CPU bound, spending 85% of it's time
      in the VFS code, mostly traversing the superblock dirty inode list
      to separate dirty inodes old enough to flush.
      
      We already keep an index of all metadata changes in age order - in
      the AIL - and continued log pressure will do age ordered writeback
      without any extra overhead at all. If there is no pressure on the
      log, the xfssyncd will periodically write back metadata in ascending
      disk address offset order so will be very efficient.
      
      Hence we can stop marking VFS inodes dirty during transaction commit
      or when changing timestamps during transactions. This will keep the
      inodes in the superblock dirty list to those containing data or
      unlogged metadata changes.
      
      However, the timstamp changes are slightly more complex than this -
      there are a couple of places that do unlogged updates of the
      timestamps, and the VFS need to be informed of these. Hence add a
      new function xfs_trans_ichgtime() for transactional changes,
      and leave xfs_ichgtime() for the non-transactional changes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      dcd79a14
  17. 27 7月, 2010 10 次提交
  18. 19 5月, 2010 3 次提交
  19. 02 3月, 2010 1 次提交
  20. 02 2月, 2010 1 次提交
    • D
      xfs: Don't issue buffer IO direct from AIL push V2 · d808f617
      Dave Chinner 提交于
      All buffers logged into the AIL are marked as delayed write.
      When the AIL needs to push the buffer out, it issues an async write of the
      buffer. This means that IO patterns are dependent on the order of
      buffers in the AIL.
      
      Instead of flushing the buffer, promote the buffer in the delayed
      write list so that the next time the xfsbufd is run the buffer will
      be flushed by the xfsbufd. Return the state to the xfsaild that the
      buffer was promoted so that the xfsaild knows that it needs to cause
      the xfsbufd to run to flush the buffers that were promoted.
      
      Using the xfsbufd for issuing the IO allows us to dispatch all
      buffer IO from the one queue. This means that we can make much more
      enlightened decisions on what order to flush buffers to disk as
      we don't have multiple places issuing IO. Optimisations to xfsbufd
      will be in a future patch.
      
      Version 2
      - kill XFS_ITEM_FLUSHING as it is now unused.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      d808f617
  21. 06 2月, 2010 1 次提交
    • D
      xfs: Use delayed write for inodes rather than async V2 · c854363e
      Dave Chinner 提交于
      We currently do background inode flush asynchronously, resulting in
      inodes being written in whatever order the background writeback
      issues them. Not only that, there are also blocking and non-blocking
      asynchronous inode flushes, depending on where the flush comes from.
      
      This patch completely removes asynchronous inode writeback. It
      removes all the strange writeback modes and replaces them with
      either a synchronous flush or a non-blocking delayed write flush.
      That is, inode flushes will only issue IO directly if they are
      synchronous, and background flushing may do nothing if the operation
      would block (e.g. on a pinned inode or buffer lock).
      
      Delayed write flushes will now result in the inode buffer sitting in
      the delwri queue of the buffer cache to be flushed by either an AIL
      push or by the xfsbufd timing out the buffer. This will allow
      accumulation of dirty inode buffers in memory and allow optimisation
      of inode cluster writeback at the xfsbufd level where we have much
      greater queue depths than the block layer elevators. We will also
      get adjacent inode cluster buffer IO merging for free when a later
      patch in the series allows sorting of the delayed write buffers
      before dispatch.
      
      This effectively means that any inode that is written back by
      background writeback will be seen as flush locked during AIL
      pushing, and will result in the buffers being pushed from there.
      This writeback path is currently non-optimal, but the next patch
      in the series will fix that problem.
      
      A side effect of this delayed write mechanism is that background
      inode reclaim will no longer directly flush inodes, nor can it wait
      on the flush lock. The result is that inode reclaim must leave the
      inode in the reclaimable state until it is clean. Hence attempts to
      reclaim a dirty inode in the background will simply skip the inode
      until it is clean and this allows other mechanisms (i.e. xfsbufd) to
      do more optimal writeback of the dirty buffers. As a result, the
      inode reclaim code has been rewritten so that it no longer relies on
      the ambiguous return values of xfs_iflush() to determine whether it
      is safe to reclaim an inode.
      
      Portions of this patch are derived from patches by Christoph
      Hellwig.
      
      Version 2:
      - cleanup reclaim code as suggested by Christoph
      - log background reclaim inode flush errors
      - just pass sync flags to xfs_iflush
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c854363e
  22. 22 1月, 2010 2 次提交