1. 27 7月, 2010 2 次提交
  2. 24 6月, 2010 3 次提交
    • D
      xfs: remove block number from inode lookup code · 7b6259e7
      Dave Chinner 提交于
      The block number comes from bulkstat based inode lookups to shortcut
      the mapping calculations. We ar enot able to trust anything from
      bulkstat, so drop the block number as well so that the correct
      lookups and mappings are always done.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      7b6259e7
    • D
      xfs: rename XFS_IGET_BULKSTAT to XFS_IGET_UNTRUSTED · 1920779e
      Dave Chinner 提交于
      Inode numbers may come from somewhere external to the filesystem
      (e.g. file handles, bulkstat information) and so are inherently
      untrusted. Rename the flag we use for these lookups to make it
      obvious we are doing a lookup of an untrusted inode number and need
      to verify it completely before trying to read it from disk.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      1920779e
    • D
      xfs: validate untrusted inode numbers during lookup · 7124fe0a
      Dave Chinner 提交于
      When we decode a handle or do a bulkstat lookup, we are using an
      inode number we cannot trust to be valid. If we are deleting inode
      chunks from disk (default noikeep mode), then we cannot trust the on
      disk inode buffer for any given inode number to correctly reflect
      whether the inode has been unlinked as the di_mode nor the
      generation number may have been updated on disk.
      
      This is due to the fact that when we delete an inode chunk, we do
      not write the clusters back to disk when they are removed - instead
      we mark them stale to avoid them being written back potentially over
      the top of something that has been subsequently allocated at that
      location. The result is that we can have locations of disk that look
      like they contain valid inodes but in reality do not. Hence we
      cannot simply convert the inode number to a block number and read
      the location from disk to determine if the inode is valid or not.
      
      As a result, and XFS_IGET_BULKSTAT lookup needs to actually look the
      inode up in the inode allocation btree to determine if the inode
      number is valid or not.
      
      It should be noted even on ikeep filesystems, there is the
      possibility that blocks on disk may look like valid inode clusters.
      e.g. if there are filesystem images hosted on the filesystem. Hence
      even for ikeep filesystems we really need to validate that the inode
      number is valid before issuing the inode buffer read.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      7124fe0a
  3. 22 1月, 2010 1 次提交
  4. 16 1月, 2010 2 次提交
    • D
      xfs: Replace per-ag array with a radix tree · 1c1c6ebc
      Dave Chinner 提交于
      The use of an array for the per-ag structures requires reallocation
      of the array when growing the filesystem. This requires locking
      access to the array to avoid use after free situations, and the
      locking is difficult to get right. To avoid needing to reallocate an
      array, change the per-ag structures to an allocated object per ag
      and index them using a tree structure.
      
      The AGs are always densely indexed (hence the use of an array), but
      the number supported is 2^32 and lookups tend to be random and hence
      indexing needs to scale. A simple choice is a radix tree - it works
      well with this sort of index.  This change also removes another
      large contiguous allocation from the mount/growfs path in XFS.
      
      The growing process now needs to change to only initialise the new
      AGs required for the extra space, and as such only needs to
      exclusively lock the tree for inserts. The rest of the code only
      needs to lock the tree while doing lookups, and hence this will
      remove all the deadlocks that currently occur on the m_perag_lock as
      it is now an innermost lock. The lock is also changed to a spinlock
      from a read/write lock as the hold time is now extremely short.
      
      To complete the picture, the per-ag structures will need to be
      reference counted to ensure that we don't free/modify them while
      they are still in use.  This will be done in subsequent patch.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      1c1c6ebc
    • D
      xfs: convert remaining direct references to m_perag · 44b56e0a
      Dave Chinner 提交于
      Convert the remaining direct lookups of the per ag structures to use
      get/put accesses. Ensure that the loops across AGs and prior users
      of the interface balance gets and puts correctly.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      44b56e0a
  5. 12 12月, 2009 1 次提交
  6. 30 10月, 2009 1 次提交
    • E
      xfs: free temporary cursor in xfs_dialloc · 3b826386
      Eric Sandeen 提交于
      Commit bd169565 seems
      to have a slight regression where this code path:
      
          if (!--searchdistance) {
              /*
               * Not in range - save last search
               * location and allocate a new inode
               */
              ...
              goto newino;
          }
      
      doesn't free the temporary cursor (tcur) that got dup'd in
      this function.
      
      This leaks an item in the xfs_btree_cur zone, and it's caught
      on module unload:
      
      ===========================================================
      BUG xfs_btree_cur: Objects remaining on kmem_cache_close()
      -----------------------------------------------------------
      
      It seems like maybe a single free at the end of the function might
      be cleaner, but for now put a del_cursor right in this code block
      similar to the handling in the rest of the function.
      Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      3b826386
  7. 02 9月, 2009 8 次提交
  8. 29 3月, 2009 1 次提交
  9. 09 2月, 2009 1 次提交
  10. 19 1月, 2009 1 次提交
  11. 16 1月, 2009 1 次提交
  12. 01 12月, 2008 8 次提交
  13. 30 10月, 2008 8 次提交
  14. 29 4月, 2008 1 次提交
    • D
      [XFS] Don't initialise new inode generation numbers to zero · 359346a9
      David Chinner 提交于
      When we allocation new inode chunks, we initialise the generation numbers
      to zero. This works fine until we delete a chunk and then reallocate it,
      resulting in the same inode numbers but with a reset generation count.
      This can result in inode/generation pairs of different inodes occurring
      relatively close together.
      
      Given that the inode/gen pair makes up the "unique" portion of an NFS
      filehandle on XFS, this can result in file handles cached on clients being
      seen on the wire from the server but refer to a different file. This
      causes .... issues for NFS clients.
      
      Hence we need a unique generation number initialisation for each inode to
      prevent reuse of a small portion of the generation number space. Use a
      random number to initialise the generation number so we don't need to keep
      any new state on disk whilst making the new number difficult to guess from
      previous allocations.
      
      SGI-PV: 979416
      SGI-Modid: xfs-linux-melb:xfs-kern:31001a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      359346a9
  15. 18 4月, 2008 1 次提交
    • D
      [XFS] Account for inode cluster alignment in all allocations · 75de2a91
      David Chinner 提交于
      At ENOSPC, we can get a filesystem shutdown due to a cancelling a dirty
      transaction in xfs_mkdir or xfs_create. This is due to the initial
      allocation attempt not taking into account inode alignment and hence we
      can prepare the AGF freelist for allocation when it's not actually
      possible to do an allocation. This results in inode allocation returning
      ENOSPC with a dirty transaction, and hence we shut down the filesystem.
      
      Because the first allocation is an exact allocation attempt, we must tell
      the allocator that the alignment does not affect the allocation attempt.
      i.e. we will accept any extent alignment as long as the extent starts at
      the block we want. Unfortunately, this means that if the longest free
      extent is less than the length + alignment necessary for fallback
      allocation attempts but is long enough to attempt a non-aligned
      allocation, we will modify the free list.
      
      If we then have the exact allocation fail, all other allocation attempts
      will also fail due to the alignment constraint being taken into account.
      Hence the initial attempt needs to set the "alignment slop" field so that
      alignment, while not required, must be taken into account when determining
      if there is enough space left in the AG to do the allocation.
      
      That means if the exact allocation fails, we will not dirty the freelist
      if there is not enough space available fo a subsequent allocation to
      succeed. Hence we get an ENOSPC error back to userspace without shutting
      down the filesystem.
      
      SGI-PV: 978886
      SGI-Modid: xfs-linux-melb:xfs-kern:30699a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      75de2a91