1. 18 4月, 2008 3 次提交
    • D
      [XFS] Remove the xfs_icluster structure · bad55843
      David Chinner 提交于
      Remove the xfs_icluster structure and replace with a radix tree lookup.
      
      We don't need to keep a list of inodes in each cluster around anymore as
      we can look them up quickly when we need to. The only time we need to do
      this now is during inode writeback.
      
      Factor the inode cluster writeback code out of xfs_iflush and convert it
      to use radix_tree_gang_lookup() instead of walking a list of inodes built
      when we first read in the inodes.
      
      This remove 3 pointers from each xfs_inode structure and the xfs_icluster
      structure per inode cluster. Hence we reduce the cache footprint of the
      xfs_inodes by between 5-10% depending on cluster sparseness.
      
      To be truly efficient we need a radix_tree_gang_lookup_range() call to
      stop searching once we are past the end of the cluster instead of trying
      to find a full cluster's worth of inodes.
      
      Before (ia64):
      
      $ cat /sys/slab/xfs_inode/object_size 536
      
      After:
      
      $ cat /sys/slab/xfs_inode/object_size 512
      
      SGI-PV: 977460
      SGI-Modid: xfs-linux-melb:xfs-kern:30502a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      bad55843
    • D
      [XFS] Don't block pdflush when writing back inodes · a3f74ffb
      David Chinner 提交于
      When pdflush is writing back inodes, it can get stuck on inode cluster
      buffers that are currently under I/O. This occurs when we write data to
      multiple inodes in the same inode cluster at the same time.
      
      Effectively, delayed allocation marks the inode dirty during the data
      writeback. Hence if the inode cluster was flushed during the writeback of
      the first inode, the writeback of the second inode will block waiting for
      the inode cluster write to complete before writing it again for the newly
      dirtied inode.
      
      Basically, we want to avoid this from happening so we don't block pdflush
      and slow down all of writeback. Hence we introduce a non-blocking async
      inode flush flag that pdflush uses. If this flag is set, we use
      non-blocking operations (e.g. try locks) whereever we can to avoid
      blocking or extra I/O being issued.
      
      SGI-PV: 970925
      SGI-Modid: xfs-linux-melb:xfs-kern:30501a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      a3f74ffb
    • D
      [XFS] Remove the xfs_refcache · 163d3686
      Donald Douwsma 提交于
      Remove the xfs_refcache, it was only needed while we were still
      building for 2.4 kernels.
      
      SGI-PV: 971186
      SGI-Modid: xfs-linux-melb:xfs-kern:30472a
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      163d3686
  2. 07 2月, 2008 8 次提交
  3. 16 10月, 2007 7 次提交
  4. 15 10月, 2007 3 次提交
    • D
      [XFS] Radix tree based inode caching · da353b0d
      David Chinner 提交于
      One of the perpetual scaling problems XFS has is indexing it's incore
      inodes. We currently uses hashes and the default hash sizes chosen can
      only ever be a tradeoff between memory consumption and the maximum
      realistic size of the cache.
      
      As a result, anyone who has millions of inodes cached on a filesystem
      needs to tunes the size of the cache via the ihashsize mount option to
      allow decent scalability with inode cache operations.
      
      A further problem is the separate inode cluster hash, whose size is based
      on the ihashsize but is smaller, and so under certain conditions (sparse
      cluster cache population) this can become a limitation long before the
      inode hash is causing issues.
      
      The following patchset removes the inode hash and cluster hash and
      replaces them with radix trees to avoid the scalability limitations of the
      hashes. It also reduces the size of the inodes by 3 pointers....
      
      SGI-PV: 969561
      SGI-Modid: xfs-linux-melb:xfs-kern:29481a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      da353b0d
    • C
      [XFS] dinode endianess annotations · 347d1c01
      Christoph Hellwig 提交于
      Biggest bit is duplicating the dinode structure so we have one annotated for
      native endianess and one for disk endianess. The other significant change
      is that xfs_xlate_dinode_core is split into one helper per direction to
      allow for proper annotations, everything else is trivial.
      
      As a sidenode splitting out the incore dinode means we can move it into
      xfs_inode.h in a later patch and severely improving on the include hell in
      xfs.
      
      SGI-PV: 968563
      SGI-Modid: xfs-linux-melb:xfs-kern:29476a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      347d1c01
    • C
      [XFS] split ondisk vs incore versions of xfs_bmbt_rec_t · a6f64d4a
      Christoph Hellwig 提交于
      currently xfs_bmbt_rec_t is used both for ondisk extents as well as
      host-endian ones. This patch adds a new xfs_bmbt_rec_host_t for the native
      endian ones and cleans up the fallout. There have been various endianess
      issues in the tracing / debug printf code that are fixed by this patch.
      
      SGI-PV: 968563
      SGI-Modid: xfs-linux-melb:xfs-kern:29318a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      a6f64d4a
  5. 14 7月, 2007 2 次提交
    • D
      [XFS] Fix lockdep annotations for xfs_lock_inodes · 0f1145cc
      David Chinner 提交于
      SGI-PV: 967035
      SGI-Modid: xfs-linux-melb:xfs-kern:29026a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      0f1145cc
    • D
      [XFS] Concurrent Multi-File Data Streams · 2a82b8be
      David Chinner 提交于
      In media spaces, video is often stored in a frame-per-file format. When
      dealing with uncompressed realtime HD video streams in this format, it is
      crucial that files do not get fragmented and that multiple files a placed
      contiguously on disk.
      
      When multiple streams are being ingested and played out at the same time,
      it is critical that the filesystem does not cross the streams and
      interleave them together as this creates seek and readahead cache miss
      latency and prevents both ingest and playout from meeting frame rate
      targets.
      
      This patch set creates a "stream of files" concept into the allocator to
      place all the data from a single stream contiguously on disk so that RAID
      array readahead can be used effectively. Each additional stream gets
      placed in different allocation groups within the filesystem, thereby
      ensuring that we don't cross any streams. When an AG fills up, we select a
      new AG for the stream that is not in use.
      
      The core of the functionality is the stream tracking - each inode that we
      create in a directory needs to be associated with the directories' stream.
      Hence every time we create a file, we look up the directories' stream
      object and associate the new file with that object.
      
      Once we have a stream object for a file, we use the AG that the stream
      object point to for allocations. If we can't allocate in that AG (e.g. it
      is full) we move the entire stream to another AG. Other inodes in the same
      stream are moved to the new AG on their next allocation (i.e. lazy
      update).
      
      Stream objects are kept in a cache and hold a reference on the inode.
      Hence the inode cannot be reclaimed while there is an outstanding stream
      reference. This means that on unlink we need to remove the stream
      association and we also need to flush all the associations on certain
      events that want to reclaim all unreferenced inodes (e.g. filesystem
      freeze).
      
      SGI-PV: 964469
      SGI-Modid: xfs-linux-melb:xfs-kern:29096a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NBarry Naujok <bnaujok@sgi.com>
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      Signed-off-by: NVlad Apostolov <vapo@sgi.com>
      2a82b8be
  6. 08 5月, 2007 3 次提交
    • L
      [XFS] Add lockdep support for XFS · f7c66ce3
      Lachlan McIlroy 提交于
      SGI-PV: 963965
      SGI-Modid: xfs-linux-melb:xfs-kern:28485a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      f7c66ce3
    • L
      [XFS] Fix to prevent the notorious 'NULL files' problem after a crash. · ba87ea69
      Lachlan McIlroy 提交于
      The problem that has been addressed is that of synchronising updates of
      the file size with writes that extend a file. Without the fix the update
      of a file's size, as a result of a write beyond eof, is independent of
      when the cached data is flushed to disk. Often the file size update would
      be written to the filesystem log before the data is flushed to disk. When
      a system crashes between these two events and the filesystem log is
      replayed on mount the file's size will be set but since the contents never
      made it to disk the file is full of holes. If some of the cached data was
      flushed to disk then it may just be a section of the file at the end that
      has holes.
      
      There are existing fixes to help alleviate this problem, particularly in
      the case where a file has been truncated, that force cached data to be
      flushed to disk when the file is closed. If the system crashes while the
      file(s) are still open then this flushing will never occur.
      
      The fix that we have implemented is to introduce a second file size,
      called the in-memory file size, that represents the current file size as
      viewed by the user. The existing file size, called the on-disk file size,
      is the one that get's written to the filesystem log and we only update it
      when it is safe to do so. When we write to a file beyond eof we only
      update the in- memory file size in the write operation. Later when the I/O
      operation, that flushes the cached data to disk completes, an I/O
      completion routine will update the on-disk file size. The on-disk file
      size will be updated to the maximum offset of the I/O or to the value of
      the in-memory file size if the I/O includes eof.
      
      SGI-PV: 958522
      SGI-Modid: xfs-linux-melb:xfs-kern:28322a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      ba87ea69
    • L
      [XFS] propogate return codes from flush routines · d3cf2094
      Lachlan McIlroy 提交于
      This patch handles error return values in fs_flush_pages and
      fs_flushinval_pages. It changes the prototype of fs_flushinval_pages so we
      can propogate the errors and handle them at higher layers. I also modified
      xfs_itruncate_start so that it could propogate the error further.
      
      SGI-PV: 961990
      SGI-Modid: xfs-linux-melb:xfs-kern:28231a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NStewart Smith <stewart@flamingspork.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      d3cf2094
  7. 11 11月, 2006 1 次提交
  8. 28 9月, 2006 2 次提交
  9. 09 6月, 2006 2 次提交
  10. 11 4月, 2006 1 次提交
  11. 17 3月, 2006 2 次提交
    • N
      [XFS] Fix an infinite loop issue in bulkstat when a corrupt inode is · b12dd342
      Nathan Scott 提交于
      detected.  Thanks to Roger Willcocks.
      
      SGI-PV: 951054
      SGI-Modid: xfs-linux-melb:xfs-kern:25477a
      Signed-off-by: NNathan Scott <nathans@sgi.com>
      b12dd342
    • M
      [XFS] There are a few problems with the new · 8867bc9b
      Mandy Kirkconnell 提交于
      xfs_bmap_search_multi_extents() wrapper function that I introduced in mod
      xfs-linux:xfs-kern:207393a. The function was added as a wrapper around
      xfs_bmap_do_search_extents() to avoid breaking the top-of-tree CXFS
      interface.  The idea of the function was basically to extract the target
      extent buffer (if muli- level extent allocation mode), then call
      xfs_bmap_do_search_extents() with either a pointer to the first extent in
      the target buffer or a pointer to the first extent in the file, depending
      on which extent mode was being used.  However, in addition to locating the
      target extent record for block bno, xfs_bmap_do_search_extents() also sets
      four parameters needed by the caller: *lastx, *eofp, *gotp, *prevp. 
      Passing only the target extent buffer to xfs_bmap_do_search_extents()
      causes *eofp to be set incorrectly if the extent is at the end of the
      target list but there are actually more extents in the next er_extbuf.
      Likewise, if the extent is the first one in the buffer but NOT the first
      in the file, *prevp is incorrectly set to NULL.  Adding the needed
      functionality to xfs_bmap_search_multi_extents() to re-set any incorrectly
      set fields is redundant and makes the call to xfs_bmap_do_search_extents()
      not make much sense when multi-level extent allocation mode is being used.
       This mod basically extracts the two functional components from
      xfs_bmap_do_search_extents(), with the intent of obsoleting/removing
      xfs_bmap_do_search_extents() after the CXFS mult-level in-core extent
      changes are checked in.  The two components are:  1) The binary search to
      locate the target extent record, and 2) Setting the four parameters needed
      by the caller (*lastx, *eofp, *gotp, *prevp).  Component 1: I created a
      new function in xfs_inode.c called xfs_iext_bno_to_ext(), which executes
      the binary search to find the target extent record.
      xfs_bmap_search_multi_extents() has been modified to call
      xfs_iext_bno_to_ext() rather than xfs_bmap_do_search_extents().  Component
      2: The parameter setting functionality has been added to
      xfs_bmap_search_multi_extents(), eliminating the need for
      xfs_bmap_do_search_extents().  These changes make the removal of
      xfs_bmap_do_search_extents() trival once the CXFS changes are in place. 
      They also allow us to maintain the current XFS interface, using the new
      search function introduced in mod xfs-linux:xfs-kern:207393a.
      
      SGI-PV: 928864
      SGI-Modid: xfs-linux-melb:xfs-kern:207866a
      Signed-off-by: NMandy Kirkconnell <alkirkco@sgi.com>
      Signed-off-by: NNathan Scott <nathans@sgi.com>
      8867bc9b
  12. 14 3月, 2006 2 次提交
    • M
      [XFS] 929045 567344 This mod introduces multi-level in-core file extent · 0293ce3a
      Mandy Kirkconnell 提交于
      functionality, building upon the new layout introduced in mod
      xfs-linux:xfs-kern:207390a.  The new multi-level extent allocations are
      only required for heavily fragmented files, so the old-style linear extent
      list is used on files until the extents reach a pre-determined size of 4k.
      4k buffers are used because this is the system page size on Linux i386 and
      systems with larger page sizes don't seem to gain much, if anything, by
      using their native page size as the extent buffer size. Also, using 4k
      extent buffers everywhere provides a consistent interface for CXFS across
      different platforms.  The 4k extent buffers are managed by an indirection
      array (xfs_ext_irec_t) which is basically just a pointer array with a bit
      of extra information to keep track of the number of extents in each buffer
      as well as the extent offset of each buffer.  Major changes include:  -
      Add multi-level in-core file extent functionality to the xfs_iext_  
      subroutines introduced in mod:	xfs-linux:xfs-kern:207390a  - Introduce 13
      new subroutines which add functionality for multi-level   in-core file
      extents:	 xfs_iext_add_indirect_multi()	      
      xfs_iext_remove_indirect()	   xfs_iext_realloc_indirect()	      
      xfs_iext_indirect_to_direct()	      xfs_iext_bno_to_irec()	    
      xfs_iext_idx_to_irec()	       xfs_iext_irec_init()	   
      xfs_iext_irec_new()	    xfs_iext_irec_remove()	  
      xfs_iext_irec_compact() 	xfs_iext_irec_compact_pages()	     
      xfs_iext_irec_compact_full()	     xfs_iext_irec_update_extoffs()
      
      SGI-PV: 928864
      SGI-Modid: xfs-linux-melb:xfs-kern:207393a
      Signed-off-by: NMandy Kirkconnell <alkirkco@sgi.com>
      Signed-off-by: NNathan Scott <nathans@sgi.com>
      0293ce3a
    • M
      [XFS] 929045 567344 This mod re-organizes some of the in-core file extent · 4eea22f0
      Mandy Kirkconnell 提交于
      code to prepare for an upcoming mod which will introduce multi-level
      in-core extent allocations. Although the in-core extent management is
      using a new code path in this mod, the functionality remains the same. 
      Major changes include:	- Introduce 10 new subroutines which re-orgainze
      the existing code but	do NOT change functionality:	    
      xfs_iext_get_ext()	   xfs_iext_insert()	     xfs_iext_add()	  
       xfs_iext_remove()	   xfs_iext_remove_inline()	   
      xfs_iext_remove_direct()	 xfs_iext_realloc_direct()	  
      xfs_iext_direct_to_inline()	    xfs_iext_inline_to_direct()        
      xfs_iext_destroy() - Remove 2 subroutines (functionality moved to new
      subroutines above):	    xfs_iext_realloc() -replaced by xfs_iext_add()
      and xfs_iext_remove()	      xfs_bmap_insert_exlist() - replaced by
      xfs_iext_insert()	  xfs_bmap_delete_exlist() - replaced by
      xfs_iext_remove() - Replace all hard-coded (indexed) extent assignments
      with a call to	 xfs_iext_get_ext() - Replace all extent record pointer
      arithmetic (ep++, ep--, base + lastx,..)   with calls to
      xfs_iext_get_ext() - Update comments to remove the idea of a single
      "extent list" and   introduce "extent record" terminology instead
      
      SGI-PV: 928864
      SGI-Modid: xfs-linux-melb:xfs-kern:207390a
      Signed-off-by: NMandy Kirkconnell <alkirkco@sgi.com>
      Signed-off-by: NNathan Scott <nathans@sgi.com>
      4eea22f0
  13. 11 1月, 2006 2 次提交
  14. 02 11月, 2005 2 次提交