1. 08 7月, 2011 1 次提交
    • C
      xfs: kill xfs_itruncate_start · 857b9778
      Christoph Hellwig 提交于
      xfs_itruncate_start is a rather length wrapper that evaluates to a call
      to xfs_ioend_wait and xfs_tosspages, and only has two callers.
      
      Instead of using the complicated checks left over from IRIX where we
      can to truncate the pagecache just call xfs_tosspages
      (aka truncate_inode_pages) directly as we want to get rid of all data
      after i_size, and truncate_inode_pages handles incorrect alignments
      and too large offsets just fine.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      857b9778
  2. 25 5月, 2011 3 次提交
  3. 20 5月, 2011 1 次提交
  4. 10 5月, 2011 1 次提交
  5. 31 3月, 2011 1 次提交
  6. 26 3月, 2011 1 次提交
    • D
      xfs: introduce inode cluster buffer trylocks for xfs_iflush · 1bfd8d04
      Dave Chinner 提交于
      There is an ABBA deadlock between synchronous inode flushing in
      xfs_reclaim_inode and xfs_icluster_free. xfs_icluster_free locks the
      buffer, then takes inode ilocks, whilst synchronous reclaim takes
      the ilock followed by the buffer lock in xfs_iflush().
      
      To avoid this deadlock, separate the inode cluster buffer locking
      semantics from the synchronous inode flush semantics, allowing
      callers to attempt to lock the buffer but still issue synchronous IO
      if it can get the buffer. This requires xfs_iflush() calls that
      currently use non-blocking semantics to pass SYNC_TRYLOCK rather
      than 0 as the flags parameter.
      
      This allows xfs_reclaim_inode to avoid the deadlock on the buffer
      lock and detect the failure so that it can drop the inode ilock and
      restart the reclaim attempt on the inode. This allows
      xfs_ifree_cluster to obtain the inode lock, mark the inode stale and
      release it and hence defuse the deadlock situation. It also has the
      pleasant side effect of avoiding IO in xfs_reclaim_inode when it
      tries to next reclaim the inode as it is now marked stale.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      1bfd8d04
  7. 07 3月, 2011 4 次提交
  8. 23 2月, 2011 1 次提交
  9. 02 12月, 2010 1 次提交
  10. 17 12月, 2010 1 次提交
    • D
      xfs: convert inode cache lookups to use RCU locking · 1a3e8f3d
      Dave Chinner 提交于
      With delayed logging greatly increasing the sustained parallelism of inode
      operations, the inode cache locking is showing significant read vs write
      contention when inode reclaim runs at the same time as lookups. There is
      also a lot more write lock acquistions than there are read locks (4:1 ratio)
      so the read locking is not really buying us much in the way of parallelism.
      
      To avoid the read vs write contention, change the cache to use RCU locking on
      the read side. To avoid needing to RCU free every single inode, use the built
      in slab RCU freeing mechanism. This requires us to be able to detect lookups of
      freed inodes, so enѕure that ever freed inode has an inode number of zero and
      the XFS_IRECLAIM flag set. We already check the XFS_IRECLAIM flag in cache hit
      lookup path, but also add a check for a zero inode number as well.
      
      We canthen convert all the read locking lockups to use RCU read side locking
      and hence remove all read side locking.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      1a3e8f3d
  11. 19 10月, 2010 3 次提交
  12. 24 8月, 2010 1 次提交
    • D
      xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE · 5b3eed75
      Dave Chinner 提交于
      Under heavy load parallel metadata loads (e.g. dbench), we can fail
      to mark all the inodes in a cluster being freed as XFS_ISTALE as we
      skip inodes we cannot get the XFS_ILOCK_EXCL or the flush lock on.
      When this happens and the inode cluster buffer has already been
      marked stale and freed, inode reclaim can try to write the inode out
      as it is dirty and not marked stale. This can result in writing th
      metadata to an freed extent, or in the case it has already
      been overwritten trigger a magic number check failure and return an
      EUCLEAN error such as:
      
      Filesystem "ram0": inode 0x442ba1 background reclaim flush failed with 117
      
      Fix this by ensuring that we hoover up all in memory inodes in the
      cluster and mark them XFS_ISTALE when freeing the cluster.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5b3eed75
  13. 27 7月, 2010 9 次提交
  14. 24 6月, 2010 2 次提交
  15. 03 6月, 2010 1 次提交
    • D
      xfs: fix race in inode cluster freeing failing to stale inodes · 5b257b4a
      Dave Chinner 提交于
      When an inode cluster is freed, it needs to mark all inodes in memory as
      XFS_ISTALE before marking the buffer as stale. This is eeded because the inodes
      have a different life cycle to the buffer, and once the buffer is torn down
      during transaction completion, we must ensure none of the inodes get written
      back (which is what XFS_ISTALE does).
      
      Unfortunately, xfs_ifree_cluster() has some bugs that lead to inodes not being
      marked with XFS_ISTALE. This shows up when xfs_iflush() is called on these
      inodes either during inode reclaim or tail pushing on the AIL.  The buffer is
      read back, but no longer contains inodes and so triggers assert failures and
      shutdowns. This was reproducable with at run.dbench10 invocation from xfstests.
      
      There are two main causes of xfs_ifree_cluster() failing. The first is simple -
      it checks in-memory inodes it finds in the per-ag icache to see if they are
      clean without holding the flush lock. if they are clean it skips them
      completely. However, If an inode is flushed delwri, it will
      appear clean, but is not guaranteed to be written back until the flush lock has
      been dropped. Hence we may have raced on the clean check and the inode may
      actually be dirty. Hence always mark inodes found in memory stale before we
      check properly if they are clean.
      
      The second is more complex, and makes the first problem easier to hit.
      Basically the in-memory inode scan is done with full knowledge it can be racing
      with inode flushing and AIl tail pushing, which means that inodes that it can't
      get the flush lock on might not be attached to the buffer after then in-memory
      inode scan due to IO completion occurring. This is actually documented in the
      code as "needs better interlocking". i.e. this is a zero-day bug.
      
      Effectively, the in-memory scan must be done while the inode buffer is locked
      and Io cannot be issued on it while we do the in-memory inode scan. This
      ensures that inodes we couldn't get the flush lock on are guaranteed to be
      attached to the cluster buffer, so we can then catch all in-memory inodes and
      mark them stale.
      
      Now that the inode cluster buffer is locked before the in-memory scan is done,
      there is no need for the two-phase update of the in-memory inodes, so simplify
      the code into two loops and remove the allocation of the temporary buffer used
      to hold locked inodes across the phases.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5b257b4a
  16. 29 5月, 2010 1 次提交
    • C
      xfs: fix access to upper inodes without inode64 · fb3b504a
      Christoph Hellwig 提交于
      If a filesystem is mounted without the inode64 mount option we
      should still be able to access inodes not fitting into 32 bits, just
      not created new ones.  For this to work we need to make sure the
      inode cache radix tree is initialized for all allocation groups, not
      just those we plan to allocate inodes from.  This patch makes sure
      we initialize the inode cache radix tree for all allocation groups,
      and also cleans xfs_initialize_perag up a bit to separate the
      inode32 logical from the general perag structure setup.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      fb3b504a
  17. 19 5月, 2010 1 次提交
  18. 02 3月, 2010 2 次提交
  19. 06 2月, 2010 2 次提交
    • D
      xfs: Use delayed write for inodes rather than async V2 · c854363e
      Dave Chinner 提交于
      We currently do background inode flush asynchronously, resulting in
      inodes being written in whatever order the background writeback
      issues them. Not only that, there are also blocking and non-blocking
      asynchronous inode flushes, depending on where the flush comes from.
      
      This patch completely removes asynchronous inode writeback. It
      removes all the strange writeback modes and replaces them with
      either a synchronous flush or a non-blocking delayed write flush.
      That is, inode flushes will only issue IO directly if they are
      synchronous, and background flushing may do nothing if the operation
      would block (e.g. on a pinned inode or buffer lock).
      
      Delayed write flushes will now result in the inode buffer sitting in
      the delwri queue of the buffer cache to be flushed by either an AIL
      push or by the xfsbufd timing out the buffer. This will allow
      accumulation of dirty inode buffers in memory and allow optimisation
      of inode cluster writeback at the xfsbufd level where we have much
      greater queue depths than the block layer elevators. We will also
      get adjacent inode cluster buffer IO merging for free when a later
      patch in the series allows sorting of the delayed write buffers
      before dispatch.
      
      This effectively means that any inode that is written back by
      background writeback will be seen as flush locked during AIL
      pushing, and will result in the buffers being pushed from there.
      This writeback path is currently non-optimal, but the next patch
      in the series will fix that problem.
      
      A side effect of this delayed write mechanism is that background
      inode reclaim will no longer directly flush inodes, nor can it wait
      on the flush lock. The result is that inode reclaim must leave the
      inode in the reclaimable state until it is clean. Hence attempts to
      reclaim a dirty inode in the background will simply skip the inode
      until it is clean and this allows other mechanisms (i.e. xfsbufd) to
      do more optimal writeback of the dirty buffers. As a result, the
      inode reclaim code has been rewritten so that it no longer relies on
      the ambiguous return values of xfs_iflush() to determine whether it
      is safe to reclaim an inode.
      
      Portions of this patch are derived from patches by Christoph
      Hellwig.
      
      Version 2:
      - cleanup reclaim code as suggested by Christoph
      - log background reclaim inode flush errors
      - just pass sync flags to xfs_iflush
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c854363e
    • D
      xfs: Make inode reclaim states explicit · 777df5af
      Dave Chinner 提交于
      A.K.A.: don't rely on xfs_iflush() return value in reclaim
      
      We have gradually been moving checks out of the reclaim code because
      they are duplicated in xfs_iflush(). We've had a history of problems
      in this area, and many of them stem from the overloading of the
      return values from xfs_iflush() and interaction with inode flush
      locking to determine if the inode is safe to reclaim.
      
      With the desire to move to delayed write flushing of inodes and
      non-blocking inode tree reclaim walks, the overloading of the
      return value of xfs_iflush makes it very difficult to determine
      the correct thing to do next.
      
      This patch explicitly re-adds the checks to the inode reclaim code,
      removing the reliance on the return value of xfs_iflush() to
      determine what to do next. It also means that we can clearly
      document all the inode states that reclaim must handle and hence
      we can easily see that we handled all the necessary cases.
      
      This also removes the need for the xfs_inode_clean() check in
      xfs_iflush() as all callers now check this first (safely).
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      777df5af
  20. 22 1月, 2010 2 次提交
  21. 16 1月, 2010 1 次提交