1. 13 7月, 2011 1 次提交
  2. 08 7月, 2011 4 次提交
    • C
      xfs: byteswap constants instead of variables · 69ef921b
      Christoph Hellwig 提交于
      Micro-optimize various comparisms by always byteswapping the constant
      instead of the variable, which allows to do the swap at compile instead
      of runtime.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      69ef921b
    • C
      xfs: remove i_transp · f3ca8738
      Christoph Hellwig 提交于
      Remove the transaction pointer in the inode.  It's only used to avoid
      passing down an argument in the bmap code, and for a few asserts in
      the transaction code right now.
      
      Also use the local variable ip in a few more places in xfs_inode_item_unlock,
      so that it isn't only used for debug builds after the above change.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      f3ca8738
    • C
      xfs: split xfs_itruncate_finish · 8f04c47a
      Christoph Hellwig 提交于
      Split the guts of xfs_itruncate_finish that loop over the existing extents
      and calls xfs_bunmapi on them into a new helper, xfs_itruncate_externs.
      Make xfs_attr_inactive call it directly instead of xfs_itruncate_finish,
      which allows to simplify the latter a lot, by only letting it deal with
      the data fork.  As a result xfs_itruncate_finish is renamed to
      xfs_itruncate_data to make its use case more obvious.
      
      Also remove the sync parameter from xfs_itruncate_data, which has been
      unessecary since the introduction of the busy extent list in 2002, and
      completely dead code since 2003 when the XFS_BMAPI_ASYNC parameter was
      made a no-op.
      
      I can't actually see why the xfs_attr_inactive needs to set the transaction
      sync, but let's keep this patch simple and without changes in behaviour.
      
      Also avoid passing a useless argument to xfs_isize_check, and make it
      private to xfs_inode.c.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      8f04c47a
    • C
      xfs: kill xfs_itruncate_start · 857b9778
      Christoph Hellwig 提交于
      xfs_itruncate_start is a rather length wrapper that evaluates to a call
      to xfs_ioend_wait and xfs_tosspages, and only has two callers.
      
      Instead of using the complicated checks left over from IRIX where we
      can to truncate the pagecache just call xfs_tosspages
      (aka truncate_inode_pages) directly as we want to get rid of all data
      after i_size, and truncate_inode_pages handles incorrect alignments
      and too large offsets just fine.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      857b9778
  3. 25 5月, 2011 3 次提交
  4. 20 5月, 2011 1 次提交
  5. 10 5月, 2011 1 次提交
  6. 31 3月, 2011 1 次提交
  7. 26 3月, 2011 1 次提交
    • D
      xfs: introduce inode cluster buffer trylocks for xfs_iflush · 1bfd8d04
      Dave Chinner 提交于
      There is an ABBA deadlock between synchronous inode flushing in
      xfs_reclaim_inode and xfs_icluster_free. xfs_icluster_free locks the
      buffer, then takes inode ilocks, whilst synchronous reclaim takes
      the ilock followed by the buffer lock in xfs_iflush().
      
      To avoid this deadlock, separate the inode cluster buffer locking
      semantics from the synchronous inode flush semantics, allowing
      callers to attempt to lock the buffer but still issue synchronous IO
      if it can get the buffer. This requires xfs_iflush() calls that
      currently use non-blocking semantics to pass SYNC_TRYLOCK rather
      than 0 as the flags parameter.
      
      This allows xfs_reclaim_inode to avoid the deadlock on the buffer
      lock and detect the failure so that it can drop the inode ilock and
      restart the reclaim attempt on the inode. This allows
      xfs_ifree_cluster to obtain the inode lock, mark the inode stale and
      release it and hence defuse the deadlock situation. It also has the
      pleasant side effect of avoiding IO in xfs_reclaim_inode when it
      tries to next reclaim the inode as it is now marked stale.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      1bfd8d04
  8. 07 3月, 2011 4 次提交
  9. 23 2月, 2011 1 次提交
  10. 02 12月, 2010 1 次提交
  11. 17 12月, 2010 1 次提交
    • D
      xfs: convert inode cache lookups to use RCU locking · 1a3e8f3d
      Dave Chinner 提交于
      With delayed logging greatly increasing the sustained parallelism of inode
      operations, the inode cache locking is showing significant read vs write
      contention when inode reclaim runs at the same time as lookups. There is
      also a lot more write lock acquistions than there are read locks (4:1 ratio)
      so the read locking is not really buying us much in the way of parallelism.
      
      To avoid the read vs write contention, change the cache to use RCU locking on
      the read side. To avoid needing to RCU free every single inode, use the built
      in slab RCU freeing mechanism. This requires us to be able to detect lookups of
      freed inodes, so enѕure that ever freed inode has an inode number of zero and
      the XFS_IRECLAIM flag set. We already check the XFS_IRECLAIM flag in cache hit
      lookup path, but also add a check for a zero inode number as well.
      
      We canthen convert all the read locking lockups to use RCU read side locking
      and hence remove all read side locking.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      1a3e8f3d
  12. 19 10月, 2010 3 次提交
  13. 24 8月, 2010 1 次提交
    • D
      xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE · 5b3eed75
      Dave Chinner 提交于
      Under heavy load parallel metadata loads (e.g. dbench), we can fail
      to mark all the inodes in a cluster being freed as XFS_ISTALE as we
      skip inodes we cannot get the XFS_ILOCK_EXCL or the flush lock on.
      When this happens and the inode cluster buffer has already been
      marked stale and freed, inode reclaim can try to write the inode out
      as it is dirty and not marked stale. This can result in writing th
      metadata to an freed extent, or in the case it has already
      been overwritten trigger a magic number check failure and return an
      EUCLEAN error such as:
      
      Filesystem "ram0": inode 0x442ba1 background reclaim flush failed with 117
      
      Fix this by ensuring that we hoover up all in memory inodes in the
      cluster and mark them XFS_ISTALE when freeing the cluster.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5b3eed75
  14. 27 7月, 2010 9 次提交
  15. 24 6月, 2010 2 次提交
  16. 03 6月, 2010 1 次提交
    • D
      xfs: fix race in inode cluster freeing failing to stale inodes · 5b257b4a
      Dave Chinner 提交于
      When an inode cluster is freed, it needs to mark all inodes in memory as
      XFS_ISTALE before marking the buffer as stale. This is eeded because the inodes
      have a different life cycle to the buffer, and once the buffer is torn down
      during transaction completion, we must ensure none of the inodes get written
      back (which is what XFS_ISTALE does).
      
      Unfortunately, xfs_ifree_cluster() has some bugs that lead to inodes not being
      marked with XFS_ISTALE. This shows up when xfs_iflush() is called on these
      inodes either during inode reclaim or tail pushing on the AIL.  The buffer is
      read back, but no longer contains inodes and so triggers assert failures and
      shutdowns. This was reproducable with at run.dbench10 invocation from xfstests.
      
      There are two main causes of xfs_ifree_cluster() failing. The first is simple -
      it checks in-memory inodes it finds in the per-ag icache to see if they are
      clean without holding the flush lock. if they are clean it skips them
      completely. However, If an inode is flushed delwri, it will
      appear clean, but is not guaranteed to be written back until the flush lock has
      been dropped. Hence we may have raced on the clean check and the inode may
      actually be dirty. Hence always mark inodes found in memory stale before we
      check properly if they are clean.
      
      The second is more complex, and makes the first problem easier to hit.
      Basically the in-memory inode scan is done with full knowledge it can be racing
      with inode flushing and AIl tail pushing, which means that inodes that it can't
      get the flush lock on might not be attached to the buffer after then in-memory
      inode scan due to IO completion occurring. This is actually documented in the
      code as "needs better interlocking". i.e. this is a zero-day bug.
      
      Effectively, the in-memory scan must be done while the inode buffer is locked
      and Io cannot be issued on it while we do the in-memory inode scan. This
      ensures that inodes we couldn't get the flush lock on are guaranteed to be
      attached to the cluster buffer, so we can then catch all in-memory inodes and
      mark them stale.
      
      Now that the inode cluster buffer is locked before the in-memory scan is done,
      there is no need for the two-phase update of the in-memory inodes, so simplify
      the code into two loops and remove the allocation of the temporary buffer used
      to hold locked inodes across the phases.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5b257b4a
  17. 29 5月, 2010 1 次提交
    • C
      xfs: fix access to upper inodes without inode64 · fb3b504a
      Christoph Hellwig 提交于
      If a filesystem is mounted without the inode64 mount option we
      should still be able to access inodes not fitting into 32 bits, just
      not created new ones.  For this to work we need to make sure the
      inode cache radix tree is initialized for all allocation groups, not
      just those we plan to allocate inodes from.  This patch makes sure
      we initialize the inode cache radix tree for all allocation groups,
      and also cleans xfs_initialize_perag up a bit to separate the
      inode32 logical from the general perag structure setup.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      fb3b504a
  18. 19 5月, 2010 1 次提交
  19. 02 3月, 2010 2 次提交
  20. 06 2月, 2010 1 次提交
    • D
      xfs: Use delayed write for inodes rather than async V2 · c854363e
      Dave Chinner 提交于
      We currently do background inode flush asynchronously, resulting in
      inodes being written in whatever order the background writeback
      issues them. Not only that, there are also blocking and non-blocking
      asynchronous inode flushes, depending on where the flush comes from.
      
      This patch completely removes asynchronous inode writeback. It
      removes all the strange writeback modes and replaces them with
      either a synchronous flush or a non-blocking delayed write flush.
      That is, inode flushes will only issue IO directly if they are
      synchronous, and background flushing may do nothing if the operation
      would block (e.g. on a pinned inode or buffer lock).
      
      Delayed write flushes will now result in the inode buffer sitting in
      the delwri queue of the buffer cache to be flushed by either an AIL
      push or by the xfsbufd timing out the buffer. This will allow
      accumulation of dirty inode buffers in memory and allow optimisation
      of inode cluster writeback at the xfsbufd level where we have much
      greater queue depths than the block layer elevators. We will also
      get adjacent inode cluster buffer IO merging for free when a later
      patch in the series allows sorting of the delayed write buffers
      before dispatch.
      
      This effectively means that any inode that is written back by
      background writeback will be seen as flush locked during AIL
      pushing, and will result in the buffers being pushed from there.
      This writeback path is currently non-optimal, but the next patch
      in the series will fix that problem.
      
      A side effect of this delayed write mechanism is that background
      inode reclaim will no longer directly flush inodes, nor can it wait
      on the flush lock. The result is that inode reclaim must leave the
      inode in the reclaimable state until it is clean. Hence attempts to
      reclaim a dirty inode in the background will simply skip the inode
      until it is clean and this allows other mechanisms (i.e. xfsbufd) to
      do more optimal writeback of the dirty buffers. As a result, the
      inode reclaim code has been rewritten so that it no longer relies on
      the ambiguous return values of xfs_iflush() to determine whether it
      is safe to reclaim an inode.
      
      Portions of this patch are derived from patches by Christoph
      Hellwig.
      
      Version 2:
      - cleanup reclaim code as suggested by Christoph
      - log background reclaim inode flush errors
      - just pass sync flags to xfs_iflush
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c854363e