1. 30 10月, 2008 3 次提交
  2. 13 8月, 2008 8 次提交
  3. 28 7月, 2008 1 次提交
  4. 29 4月, 2008 2 次提交
    • C
      [XFS] remove manual lookup from xfs_rename and simplify locking · cfa853e4
      Christoph Hellwig 提交于
      ->rename already gets the target inode passed if it exits. Pass it down to
      xfs_rename so that we can avoid looking it up again. Also simplify locking
      as the first lock section in xfs_rename can go away now: the isdir is an
      invariant over the lifetime of the inode, and new_parent and the nlink
      check are namespace topology protected by i_mutex in the VFS. The projid
      check needs to move into the second lock section anyway to not be racy.
      
      Also kill the now unused xfs_dir_lookup_int and remove the now-unused
      first_locked argumet to xfs_lock_inodes.
      
      SGI-PV: 976035
      SGI-Modid: xfs-linux-melb:xfs-kern:30903a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      cfa853e4
    • C
      [XFS] shrink mrlock_t · 579aa9ca
      Christoph Hellwig 提交于
      The writer field is not needed for non_DEBU builds so remove it. While
      we're at i also clean up the interface for is locked asserts to go through
      and xfs_iget.c helper with an interface like the xfs_ilock routines to
      isolated the XFS codebase from mrlock internals. That way we can kill
      mrlock_t entirely once rw_semaphores grow an islocked facility. Also
      remove unused flags to the ilock family of functions.
      
      SGI-PV: 976035
      SGI-Modid: xfs-linux-melb:xfs-kern:30902a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      579aa9ca
  5. 18 4月, 2008 3 次提交
    • D
      [XFS] Remove the xfs_icluster structure · bad55843
      David Chinner 提交于
      Remove the xfs_icluster structure and replace with a radix tree lookup.
      
      We don't need to keep a list of inodes in each cluster around anymore as
      we can look them up quickly when we need to. The only time we need to do
      this now is during inode writeback.
      
      Factor the inode cluster writeback code out of xfs_iflush and convert it
      to use radix_tree_gang_lookup() instead of walking a list of inodes built
      when we first read in the inodes.
      
      This remove 3 pointers from each xfs_inode structure and the xfs_icluster
      structure per inode cluster. Hence we reduce the cache footprint of the
      xfs_inodes by between 5-10% depending on cluster sparseness.
      
      To be truly efficient we need a radix_tree_gang_lookup_range() call to
      stop searching once we are past the end of the cluster instead of trying
      to find a full cluster's worth of inodes.
      
      Before (ia64):
      
      $ cat /sys/slab/xfs_inode/object_size 536
      
      After:
      
      $ cat /sys/slab/xfs_inode/object_size 512
      
      SGI-PV: 977460
      SGI-Modid: xfs-linux-melb:xfs-kern:30502a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      bad55843
    • D
      [XFS] Don't block pdflush when writing back inodes · a3f74ffb
      David Chinner 提交于
      When pdflush is writing back inodes, it can get stuck on inode cluster
      buffers that are currently under I/O. This occurs when we write data to
      multiple inodes in the same inode cluster at the same time.
      
      Effectively, delayed allocation marks the inode dirty during the data
      writeback. Hence if the inode cluster was flushed during the writeback of
      the first inode, the writeback of the second inode will block waiting for
      the inode cluster write to complete before writing it again for the newly
      dirtied inode.
      
      Basically, we want to avoid this from happening so we don't block pdflush
      and slow down all of writeback. Hence we introduce a non-blocking async
      inode flush flag that pdflush uses. If this flag is set, we use
      non-blocking operations (e.g. try locks) whereever we can to avoid
      blocking or extra I/O being issued.
      
      SGI-PV: 970925
      SGI-Modid: xfs-linux-melb:xfs-kern:30501a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      a3f74ffb
    • D
      [XFS] Remove the xfs_refcache · 163d3686
      Donald Douwsma 提交于
      Remove the xfs_refcache, it was only needed while we were still
      building for 2.4 kernels.
      
      SGI-PV: 971186
      SGI-Modid: xfs-linux-melb:xfs-kern:30472a
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      163d3686
  6. 07 2月, 2008 8 次提交
  7. 16 10月, 2007 7 次提交
  8. 15 10月, 2007 3 次提交
    • D
      [XFS] Radix tree based inode caching · da353b0d
      David Chinner 提交于
      One of the perpetual scaling problems XFS has is indexing it's incore
      inodes. We currently uses hashes and the default hash sizes chosen can
      only ever be a tradeoff between memory consumption and the maximum
      realistic size of the cache.
      
      As a result, anyone who has millions of inodes cached on a filesystem
      needs to tunes the size of the cache via the ihashsize mount option to
      allow decent scalability with inode cache operations.
      
      A further problem is the separate inode cluster hash, whose size is based
      on the ihashsize but is smaller, and so under certain conditions (sparse
      cluster cache population) this can become a limitation long before the
      inode hash is causing issues.
      
      The following patchset removes the inode hash and cluster hash and
      replaces them with radix trees to avoid the scalability limitations of the
      hashes. It also reduces the size of the inodes by 3 pointers....
      
      SGI-PV: 969561
      SGI-Modid: xfs-linux-melb:xfs-kern:29481a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      da353b0d
    • C
      [XFS] dinode endianess annotations · 347d1c01
      Christoph Hellwig 提交于
      Biggest bit is duplicating the dinode structure so we have one annotated for
      native endianess and one for disk endianess. The other significant change
      is that xfs_xlate_dinode_core is split into one helper per direction to
      allow for proper annotations, everything else is trivial.
      
      As a sidenode splitting out the incore dinode means we can move it into
      xfs_inode.h in a later patch and severely improving on the include hell in
      xfs.
      
      SGI-PV: 968563
      SGI-Modid: xfs-linux-melb:xfs-kern:29476a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      347d1c01
    • C
      [XFS] split ondisk vs incore versions of xfs_bmbt_rec_t · a6f64d4a
      Christoph Hellwig 提交于
      currently xfs_bmbt_rec_t is used both for ondisk extents as well as
      host-endian ones. This patch adds a new xfs_bmbt_rec_host_t for the native
      endian ones and cleans up the fallout. There have been various endianess
      issues in the tracing / debug printf code that are fixed by this patch.
      
      SGI-PV: 968563
      SGI-Modid: xfs-linux-melb:xfs-kern:29318a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      a6f64d4a
  9. 14 7月, 2007 2 次提交
    • D
      [XFS] Fix lockdep annotations for xfs_lock_inodes · 0f1145cc
      David Chinner 提交于
      SGI-PV: 967035
      SGI-Modid: xfs-linux-melb:xfs-kern:29026a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      0f1145cc
    • D
      [XFS] Concurrent Multi-File Data Streams · 2a82b8be
      David Chinner 提交于
      In media spaces, video is often stored in a frame-per-file format. When
      dealing with uncompressed realtime HD video streams in this format, it is
      crucial that files do not get fragmented and that multiple files a placed
      contiguously on disk.
      
      When multiple streams are being ingested and played out at the same time,
      it is critical that the filesystem does not cross the streams and
      interleave them together as this creates seek and readahead cache miss
      latency and prevents both ingest and playout from meeting frame rate
      targets.
      
      This patch set creates a "stream of files" concept into the allocator to
      place all the data from a single stream contiguously on disk so that RAID
      array readahead can be used effectively. Each additional stream gets
      placed in different allocation groups within the filesystem, thereby
      ensuring that we don't cross any streams. When an AG fills up, we select a
      new AG for the stream that is not in use.
      
      The core of the functionality is the stream tracking - each inode that we
      create in a directory needs to be associated with the directories' stream.
      Hence every time we create a file, we look up the directories' stream
      object and associate the new file with that object.
      
      Once we have a stream object for a file, we use the AG that the stream
      object point to for allocations. If we can't allocate in that AG (e.g. it
      is full) we move the entire stream to another AG. Other inodes in the same
      stream are moved to the new AG on their next allocation (i.e. lazy
      update).
      
      Stream objects are kept in a cache and hold a reference on the inode.
      Hence the inode cannot be reclaimed while there is an outstanding stream
      reference. This means that on unlink we need to remove the stream
      association and we also need to flush all the associations on certain
      events that want to reclaim all unreferenced inodes (e.g. filesystem
      freeze).
      
      SGI-PV: 964469
      SGI-Modid: xfs-linux-melb:xfs-kern:29096a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NBarry Naujok <bnaujok@sgi.com>
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      Signed-off-by: NVlad Apostolov <vapo@sgi.com>
      2a82b8be
  10. 08 5月, 2007 3 次提交
    • L
      [XFS] Add lockdep support for XFS · f7c66ce3
      Lachlan McIlroy 提交于
      SGI-PV: 963965
      SGI-Modid: xfs-linux-melb:xfs-kern:28485a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      f7c66ce3
    • L
      [XFS] Fix to prevent the notorious 'NULL files' problem after a crash. · ba87ea69
      Lachlan McIlroy 提交于
      The problem that has been addressed is that of synchronising updates of
      the file size with writes that extend a file. Without the fix the update
      of a file's size, as a result of a write beyond eof, is independent of
      when the cached data is flushed to disk. Often the file size update would
      be written to the filesystem log before the data is flushed to disk. When
      a system crashes between these two events and the filesystem log is
      replayed on mount the file's size will be set but since the contents never
      made it to disk the file is full of holes. If some of the cached data was
      flushed to disk then it may just be a section of the file at the end that
      has holes.
      
      There are existing fixes to help alleviate this problem, particularly in
      the case where a file has been truncated, that force cached data to be
      flushed to disk when the file is closed. If the system crashes while the
      file(s) are still open then this flushing will never occur.
      
      The fix that we have implemented is to introduce a second file size,
      called the in-memory file size, that represents the current file size as
      viewed by the user. The existing file size, called the on-disk file size,
      is the one that get's written to the filesystem log and we only update it
      when it is safe to do so. When we write to a file beyond eof we only
      update the in- memory file size in the write operation. Later when the I/O
      operation, that flushes the cached data to disk completes, an I/O
      completion routine will update the on-disk file size. The on-disk file
      size will be updated to the maximum offset of the I/O or to the value of
      the in-memory file size if the I/O includes eof.
      
      SGI-PV: 958522
      SGI-Modid: xfs-linux-melb:xfs-kern:28322a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      ba87ea69
    • L
      [XFS] propogate return codes from flush routines · d3cf2094
      Lachlan McIlroy 提交于
      This patch handles error return values in fs_flush_pages and
      fs_flushinval_pages. It changes the prototype of fs_flushinval_pages so we
      can propogate the errors and handle them at higher layers. I also modified
      xfs_itruncate_start so that it could propogate the error further.
      
      SGI-PV: 961990
      SGI-Modid: xfs-linux-melb:xfs-kern:28231a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NStewart Smith <stewart@flamingspork.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      d3cf2094