1. 30 7月, 2012 1 次提交
  2. 21 5月, 2012 1 次提交
  3. 15 5月, 2012 2 次提交
  4. 27 3月, 2012 1 次提交
    • D
      xfs: don't cache inodes read through bulkstat · 5132ba8f
      Dave Chinner 提交于
      When we read inodes via bulkstat, we generally only read them once
      and then throw them away - they never get used again. If we retain
      them in cache, then it simply causes the working set of inodes and
      other cached items to be reclaimed just so the inode cache can grow.
      
      Avoid this problem by marking inodes read by bulkstat not to be
      cached and check this flag in .drop_inode to determine whether the
      inode should be added to the VFS LRU or not. If the inode lookup
      hits an already cached inode, then don't set the flag. If the inode
      lookup hits an inode marked with no cache flag, remove the flag and
      allow it to be cached once the current reference goes away.
      
      Inodes marked as not cached will get cleaned up by the background
      inode reclaim or via memory pressure, so they will still generate
      some short term cache pressure. They will, however, be reclaimed
      much sooner and in preference to cache hot inodes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5132ba8f
  5. 16 3月, 2012 1 次提交
    • D
      xfs: fix inode lookup race · f30d500f
      Dave Chinner 提交于
      When we get concurrent lookups of the same inode that is not in the
      per-AG inode cache, there is a race condition that triggers warnings
      in unlock_new_inode() indicating that we are initialising an inode
      that isn't in a the correct state for a new inode.
      
      When we do an inode lookup via a file handle or a bulkstat, we don't
      serialise lookups at a higher level through the dentry cache (i.e.
      pathless lookup), and so we can get concurrent lookups of the same
      inode.
      
      The race condition is between the insertion of the inode into the
      cache in the case of a cache miss and a concurrently lookup:
      
      Thread 1			Thread 2
      xfs_iget()
        xfs_iget_cache_miss()
          xfs_iread()
          lock radix tree
          radix_tree_insert()
      				rcu_read_lock
      				radix_tree_lookup
      				lock inode flags
      				XFS_INEW not set
      				igrab()
      				unlock inode flags
      				rcu_read_unlock
      				use uninitialised inode
      				.....
          lock inode flags
          set XFS_INEW
          unlock inode flags
          unlock radix tree
        xfs_setup_inode()
          inode flags = I_NEW
          unlock_new_inode()
            WARNING as inode flags != I_NEW
      
      This can lead to inode corruption, inode list corruption, etc, and
      is generally a bad thing to occur.
      
      Fix this by setting XFS_INEW before inserting the inode into the
      radix tree. This will ensure any concurrent lookup will find the new
      inode with XFS_INEW set and that forces the lookup to wait until the
      XFS_INEW flag is removed before allowing the lookup to succeed.
      
      cc: <stable@vger.kernel.org> # for 3.0.x, 3.2.x
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      f30d500f
  6. 14 3月, 2012 1 次提交
  7. 26 2月, 2012 1 次提交
    • A
      xfs: only take the ILOCK in xfs_reclaim_inode() · ad637a10
      Alex Elder 提交于
      At the end of xfs_reclaim_inode(), the inode is locked in order to
      we wait for a possible concurrent lookup to complete before the
      inode is freed.  This synchronization step was taking both the ILOCK
      and the IOLOCK, but the latter was causing lockdep to produce
      reports of the possibility of deadlock.
      
      It turns out that there's no need to acquire the IOLOCK at this
      point anyway.  It may have been required in some earlier version of
      the code, but there should be no need to take the IOLOCK in
      xfs_iget(), so there's no (longer) any need to get it here for
      synchronization.  Add an assertion in xfs_iget() as a reminder
      of this assumption.
      
      Dave Chinner diagnosed this on IRC, and Christoph Hellwig suggested
      no longer including the IOLOCK.  I just put together the patch.
      Signed-off-by: NAlex Elder <elder@dreamhost.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ad637a10
  8. 23 2月, 2012 1 次提交
  9. 18 1月, 2012 4 次提交
    • C
      xfs: remove the i_new_size field in struct xfs_inode · 2813d682
      Christoph Hellwig 提交于
      Now that we use the VFS i_size field throughout XFS there is no need for the
      i_new_size field any more given that the VFS i_size field gets updated
      in ->write_end before unlocking the page, and thus is always uptodate when
      writeback could see a page.  Removing i_new_size also has the advantage that
      we will never have to trim back di_size during a failed buffered write,
      given that it never gets updated past i_size.
      
      Note that currently the generic direct I/O code only updates i_size after
      calling our end_io handler, which requires a small workaround to make
      sure di_size actually makes it to disk.  I hope to fix this properly in
      the generic code.
      
      A downside is that we lose the support for parallel non-overlapping O_DIRECT
      appending writes that recently was added.  I don't think keeping the complex
      and fragile i_new_size infrastructure for this is a good tradeoff - if we
      really care about parallel appending writers we should investigate turning
      the iolock into a range lock, which would also allow for parallel
      non-overlapping buffered writers.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      2813d682
    • C
      xfs: remove the i_size field in struct xfs_inode · ce7ae151
      Christoph Hellwig 提交于
      There is no fundamental need to keep an in-memory inode size copy in the XFS
      inode.  We already have the on-disk value in the dinode, and the separate
      in-memory copy that we need for regular files only in the XFS inode.
      
      Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the
      VFS inode i_size field for regular files.  Switch code that was directly
      accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where
      we are limited to regular files direct access of the VFS inode i_size field.
      
      This also allows dropping some fairly complicated code in the write path
      which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size
      that is getting updated inside ->write_end.
      
      Note that we do not bother resetting the VFS i_size when truncating a file
      that gets freed to zero as there is no point in doing so because the VFS inode
      is no longer in use at this point.  Just relax the assert in xfs_ifree to
      only check the on-disk size instead.
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ce7ae151
    • C
      xfs: replace i_flock with a sleeping bitlock · 474fce06
      Christoph Hellwig 提交于
      We almost never block on i_flock, the exception is synchronous inode
      flushing.  Instead of bloating the inode with a 16/24-byte completion
      that we abuse as a semaphore just implement it as a bitlock that uses
      a bit waitqueue for the rare sleeping path.  This primarily is a
      tradeoff between a much smaller inode and a faster non-blocking
      path vs faster wakeups, and we are much better off with the former.
      
      A small downside is that we will lose lockdep checking for i_flock, but
      given that it's always taken inside the ilock that should be acceptable.
      
      Note that for example the inode writeback locking is implemented in a
      very similar way.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      474fce06
    • C
      xfs: remove the if_ext_max field in struct xfs_ifork · 8096b1eb
      Christoph Hellwig 提交于
      We spent a lot of effort to maintain this field, but it always equals to the
      fork size divided by the constant size of an extent.  The prime use of it is
      to assert that the two stay in sync.  Just divide the fork size by the extent
      size in the few places that we actually use it and remove the overhead
      of maintaining it.  Also introduce a few helpers to consolidate the places
      where we actually care about the value.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      8096b1eb
  10. 04 1月, 2012 1 次提交
    • A
      vfs: fix the stupidity with i_dentry in inode destructors · 6b520e05
      Al Viro 提交于
      Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
      it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
      the cost of taking it into inode_init_always() will be negligible for pipes
      and sockets and negative for everything else.  Not to mention the removal of
      boilerplate code from ->destroy_inode() instances...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6b520e05
  11. 12 10月, 2011 1 次提交
  12. 13 7月, 2011 1 次提交
  13. 24 6月, 2011 1 次提交
    • D
      xfs: reset inode per-lifetime state when recycling it · 778e24bb
      Dave Chinner 提交于
      XFS inodes has several per-lifetime state fields that determine the
      behaviour of the inode. These state fields are not all reset when an
      inode is reused from the reclaimable state.
      
      This can lead to unexpected behaviour of the new inode such as
      speculative preallocation not being truncated away in the expected
      manner for local files until the inode is subsequently truncated,
      freed or cycles out of the cache. It can also lead to an inode being
      considered to be a filestream inode or having been truncated when
      that is not the case.
      
      Rework the reinitialisation of the inode when it is recycled to
      ensure that it is pristine before it is reused. While there, also
      fix the resetting of state flags in the recycling error paths so the
      inode does not become unreclaimable.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      778e24bb
  14. 07 1月, 2011 1 次提交
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
  15. 16 12月, 2010 1 次提交
  16. 17 12月, 2010 1 次提交
    • D
      xfs: convert inode cache lookups to use RCU locking · 1a3e8f3d
      Dave Chinner 提交于
      With delayed logging greatly increasing the sustained parallelism of inode
      operations, the inode cache locking is showing significant read vs write
      contention when inode reclaim runs at the same time as lookups. There is
      also a lot more write lock acquistions than there are read locks (4:1 ratio)
      so the read locking is not really buying us much in the way of parallelism.
      
      To avoid the read vs write contention, change the cache to use RCU locking on
      the read side. To avoid needing to RCU free every single inode, use the built
      in slab RCU freeing mechanism. This requires us to be able to detect lookups of
      freed inodes, so enѕure that ever freed inode has an inode number of zero and
      the XFS_IRECLAIM flag set. We already check the XFS_IRECLAIM flag in cache hit
      lookup path, but also add a check for a zero inode number as well.
      
      We canthen convert all the read locking lockups to use RCU read side locking
      and hence remove all read side locking.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      1a3e8f3d
  17. 16 12月, 2010 1 次提交
    • D
      xfs: rcu free inodes · d95b7aaf
      Dave Chinner 提交于
      Introduce RCU freeing of XFS inodes so that we can convert lookup
      traversals to use rcu_read_lock() protection. This patch only
      introduces the RCU freeing to minimise the potential conflicts with
      mainline if this is merged into mainline via a VFS patchset. It
      abuses the i_dentry list for the RCU callback structure because the
      VFS patches make this a union so it is safe to use like this and
      simplifies and merge issues.
      
      This patch uses basic RCU freeing rather than SLAB_DESTROY_BY_RCU.
      The later lookup patches need the same "found free inode" protection
      regardless of the RCU freeing method used, so once again the RCU
      freeing method can be dealt with apprpriately at merge time without
      affecting any other code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d95b7aaf
  18. 23 12月, 2010 1 次提交
    • D
      xfs: provide a inode iolock lockdep class · dcfcf205
      Dave Chinner 提交于
      The XFS iolock needs to be re-initialised to a new lock class before
      it enters reclaim to prevent lockdep false positives. Unfortunately,
      this is not sufficient protection as inodes in the XFS_IRECLAIMABLE
      state can be recycled and not re-initialised before being reused.
      
      We need to re-initialise the lock state when transfering out of
      XFS_IRECLAIMABLE state to XFS_INEW, but we need to keep the same
      class as if the inode was just allocated. Hence we need a specific
      lockdep class variable for the iolock so that both initialisations
      use the same class.
      
      While there, add a specific class for inodes in the reclaim state so
      that it is easy to tell from lockdep reports what state the inode
      was in that generated the report.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      dcfcf205
  19. 19 10月, 2010 1 次提交
    • C
      xfs: fix bogus m_maxagi check in xfs_iget · d276734d
      Christoph Hellwig 提交于
      These days inode64 should only control which AGs we allocate new
      inodes from, while we still try to support reading all existing
      inodes.  To make this actually work the check ontop of xfs_iget
      needs to be relaxed to allow inodes in all allocation groups instead
      of just those that we allow allocating inodes from.  Note that we
      can't simply remove the check - it prevents us from accessing
      invalid data when fed invalid inode numbers from NFS or bulkstat.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      d276734d
  20. 27 7月, 2010 7 次提交
  21. 24 6月, 2010 1 次提交
  22. 03 6月, 2010 1 次提交
  23. 29 5月, 2010 1 次提交
    • C
      xfs: fix access to upper inodes without inode64 · fb3b504a
      Christoph Hellwig 提交于
      If a filesystem is mounted without the inode64 mount option we
      should still be able to access inodes not fitting into 32 bits, just
      not created new ones.  For this to work we need to make sure the
      inode cache radix tree is initialized for all allocation groups, not
      just those we plan to allocate inodes from.  This patch makes sure
      we initialize the inode cache radix tree for all allocation groups,
      and also cleans xfs_initialize_perag up a bit to separate the
      inode32 logical from the general perag structure setup.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      fb3b504a
  24. 02 3月, 2010 1 次提交
  25. 16 1月, 2010 2 次提交
  26. 18 12月, 2009 1 次提交
  27. 17 12月, 2009 1 次提交
  28. 15 12月, 2009 1 次提交
    • C
      xfs: event tracing support · 0b1b213f
      Christoph Hellwig 提交于
      Convert the old xfs tracing support that could only be used with the
      out of tree kdb and xfsidbg patches to use the generic event tracer.
      
      To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
      all xfs trace channels by:
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/enable
      
      or alternatively enable single events by just doing the same in one
      event subdirectory, e.g.
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable
      
      or set more complex filters, etc. In Documentation/trace/events.txt
      all this is desctribed in more detail.  To reads the events do a
      
         cat /sys/kernel/debug/tracing/trace
      
      Compared to the last posting this patch converts the tracing mostly to
      the one tracepoint per callsite model that other users of the new
      tracing facility also employ.  This allows a very fine-grained control
      of the tracing, a cleaner output of the traces and also enables the
      perf tool to use each tracepoint as a virtual performance counter,
           allowing us to e.g. count how often certain workloads git various
           spots in XFS.  Take a look at
      
          http://lwn.net/Articles/346470/
      
      for some examples.
      
      Also the btree tracing isn't included at all yet, as it will require
      additional core tracing features not in mainline yet, I plan to
      deliver it later.
      
      And the really nice thing about this patch is that it actually removes
      many lines of code while adding this nice functionality:
      
       fs/xfs/Makefile                |    8
       fs/xfs/linux-2.6/xfs_acl.c     |    1
       fs/xfs/linux-2.6/xfs_aops.c    |   52 -
       fs/xfs/linux-2.6/xfs_aops.h    |    2
       fs/xfs/linux-2.6/xfs_buf.c     |  117 +--
       fs/xfs/linux-2.6/xfs_buf.h     |   33
       fs/xfs/linux-2.6/xfs_fs_subr.c |    3
       fs/xfs/linux-2.6/xfs_ioctl.c   |    1
       fs/xfs/linux-2.6/xfs_ioctl32.c |    1
       fs/xfs/linux-2.6/xfs_iops.c    |    1
       fs/xfs/linux-2.6/xfs_linux.h   |    1
       fs/xfs/linux-2.6/xfs_lrw.c     |   87 --
       fs/xfs/linux-2.6/xfs_lrw.h     |   45 -
       fs/xfs/linux-2.6/xfs_super.c   |  104 ---
       fs/xfs/linux-2.6/xfs_super.h   |    7
       fs/xfs/linux-2.6/xfs_sync.c    |    1
       fs/xfs/linux-2.6/xfs_trace.c   |   75 ++
       fs/xfs/linux-2.6/xfs_trace.h   | 1369 +++++++++++++++++++++++++++++++++++++++++
       fs/xfs/linux-2.6/xfs_vnode.h   |    4
       fs/xfs/quota/xfs_dquot.c       |  110 ---
       fs/xfs/quota/xfs_dquot.h       |   21
       fs/xfs/quota/xfs_qm.c          |   40 -
       fs/xfs/quota/xfs_qm_syscalls.c |    4
       fs/xfs/support/ktrace.c        |  323 ---------
       fs/xfs/support/ktrace.h        |   85 --
       fs/xfs/xfs.h                   |   16
       fs/xfs/xfs_ag.h                |   14
       fs/xfs/xfs_alloc.c             |  230 +-----
       fs/xfs/xfs_alloc.h             |   27
       fs/xfs/xfs_alloc_btree.c       |    1
       fs/xfs/xfs_attr.c              |  107 ---
       fs/xfs/xfs_attr.h              |   10
       fs/xfs/xfs_attr_leaf.c         |   14
       fs/xfs/xfs_attr_sf.h           |   40 -
       fs/xfs/xfs_bmap.c              |  507 +++------------
       fs/xfs/xfs_bmap.h              |   49 -
       fs/xfs/xfs_bmap_btree.c        |    6
       fs/xfs/xfs_btree.c             |    5
       fs/xfs/xfs_btree_trace.h       |   17
       fs/xfs/xfs_buf_item.c          |   87 --
       fs/xfs/xfs_buf_item.h          |   20
       fs/xfs/xfs_da_btree.c          |    3
       fs/xfs/xfs_da_btree.h          |    7
       fs/xfs/xfs_dfrag.c             |    2
       fs/xfs/xfs_dir2.c              |    8
       fs/xfs/xfs_dir2_block.c        |   20
       fs/xfs/xfs_dir2_leaf.c         |   21
       fs/xfs/xfs_dir2_node.c         |   27
       fs/xfs/xfs_dir2_sf.c           |   26
       fs/xfs/xfs_dir2_trace.c        |  216 ------
       fs/xfs/xfs_dir2_trace.h        |   72 --
       fs/xfs/xfs_filestream.c        |    8
       fs/xfs/xfs_fsops.c             |    2
       fs/xfs/xfs_iget.c              |  111 ---
       fs/xfs/xfs_inode.c             |   67 --
       fs/xfs/xfs_inode.h             |   76 --
       fs/xfs/xfs_inode_item.c        |    5
       fs/xfs/xfs_iomap.c             |   85 --
       fs/xfs/xfs_iomap.h             |    8
       fs/xfs/xfs_log.c               |  181 +----
       fs/xfs/xfs_log_priv.h          |   20
       fs/xfs/xfs_log_recover.c       |    1
       fs/xfs/xfs_mount.c             |    2
       fs/xfs/xfs_quota.h             |    8
       fs/xfs/xfs_rename.c            |    1
       fs/xfs/xfs_rtalloc.c           |    1
       fs/xfs/xfs_rw.c                |    3
       fs/xfs/xfs_trans.h             |   47 +
       fs/xfs/xfs_trans_buf.c         |   62 -
       fs/xfs/xfs_vnodeops.c          |    8
       70 files changed, 2151 insertions(+), 2592 deletions(-)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      0b1b213f
  29. 12 12月, 2009 1 次提交