1. 24 6月, 2011 1 次提交
    • D
      xfs: reset inode per-lifetime state when recycling it · 778e24bb
      Dave Chinner 提交于
      XFS inodes has several per-lifetime state fields that determine the
      behaviour of the inode. These state fields are not all reset when an
      inode is reused from the reclaimable state.
      
      This can lead to unexpected behaviour of the new inode such as
      speculative preallocation not being truncated away in the expected
      manner for local files until the inode is subsequently truncated,
      freed or cycles out of the cache. It can also lead to an inode being
      considered to be a filestream inode or having been truncated when
      that is not the case.
      
      Rework the reinitialisation of the inode when it is recycled to
      ensure that it is pristine before it is reused. While there, also
      fix the resetting of state flags in the recycling error paths so the
      inode does not become unreclaimable.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      778e24bb
  2. 07 1月, 2011 1 次提交
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
  3. 16 12月, 2010 1 次提交
  4. 17 12月, 2010 1 次提交
    • D
      xfs: convert inode cache lookups to use RCU locking · 1a3e8f3d
      Dave Chinner 提交于
      With delayed logging greatly increasing the sustained parallelism of inode
      operations, the inode cache locking is showing significant read vs write
      contention when inode reclaim runs at the same time as lookups. There is
      also a lot more write lock acquistions than there are read locks (4:1 ratio)
      so the read locking is not really buying us much in the way of parallelism.
      
      To avoid the read vs write contention, change the cache to use RCU locking on
      the read side. To avoid needing to RCU free every single inode, use the built
      in slab RCU freeing mechanism. This requires us to be able to detect lookups of
      freed inodes, so enѕure that ever freed inode has an inode number of zero and
      the XFS_IRECLAIM flag set. We already check the XFS_IRECLAIM flag in cache hit
      lookup path, but also add a check for a zero inode number as well.
      
      We canthen convert all the read locking lockups to use RCU read side locking
      and hence remove all read side locking.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      1a3e8f3d
  5. 16 12月, 2010 1 次提交
    • D
      xfs: rcu free inodes · d95b7aaf
      Dave Chinner 提交于
      Introduce RCU freeing of XFS inodes so that we can convert lookup
      traversals to use rcu_read_lock() protection. This patch only
      introduces the RCU freeing to minimise the potential conflicts with
      mainline if this is merged into mainline via a VFS patchset. It
      abuses the i_dentry list for the RCU callback structure because the
      VFS patches make this a union so it is safe to use like this and
      simplifies and merge issues.
      
      This patch uses basic RCU freeing rather than SLAB_DESTROY_BY_RCU.
      The later lookup patches need the same "found free inode" protection
      regardless of the RCU freeing method used, so once again the RCU
      freeing method can be dealt with apprpriately at merge time without
      affecting any other code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d95b7aaf
  6. 23 12月, 2010 1 次提交
    • D
      xfs: provide a inode iolock lockdep class · dcfcf205
      Dave Chinner 提交于
      The XFS iolock needs to be re-initialised to a new lock class before
      it enters reclaim to prevent lockdep false positives. Unfortunately,
      this is not sufficient protection as inodes in the XFS_IRECLAIMABLE
      state can be recycled and not re-initialised before being reused.
      
      We need to re-initialise the lock state when transfering out of
      XFS_IRECLAIMABLE state to XFS_INEW, but we need to keep the same
      class as if the inode was just allocated. Hence we need a specific
      lockdep class variable for the iolock so that both initialisations
      use the same class.
      
      While there, add a specific class for inodes in the reclaim state so
      that it is easy to tell from lockdep reports what state the inode
      was in that generated the report.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      dcfcf205
  7. 19 10月, 2010 1 次提交
    • C
      xfs: fix bogus m_maxagi check in xfs_iget · d276734d
      Christoph Hellwig 提交于
      These days inode64 should only control which AGs we allocate new
      inodes from, while we still try to support reading all existing
      inodes.  To make this actually work the check ontop of xfs_iget
      needs to be relaxed to allow inodes in all allocation groups instead
      of just those that we allow allocating inodes from.  Note that we
      can't simply remove the check - it prevents us from accessing
      invalid data when fed invalid inode numbers from NFS or bulkstat.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      d276734d
  8. 27 7月, 2010 7 次提交
  9. 24 6月, 2010 1 次提交
  10. 03 6月, 2010 1 次提交
  11. 29 5月, 2010 1 次提交
    • C
      xfs: fix access to upper inodes without inode64 · fb3b504a
      Christoph Hellwig 提交于
      If a filesystem is mounted without the inode64 mount option we
      should still be able to access inodes not fitting into 32 bits, just
      not created new ones.  For this to work we need to make sure the
      inode cache radix tree is initialized for all allocation groups, not
      just those we plan to allocate inodes from.  This patch makes sure
      we initialize the inode cache radix tree for all allocation groups,
      and also cleans xfs_initialize_perag up a bit to separate the
      inode32 logical from the general perag structure setup.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      fb3b504a
  12. 02 3月, 2010 1 次提交
  13. 16 1月, 2010 2 次提交
  14. 18 12月, 2009 1 次提交
  15. 17 12月, 2009 1 次提交
  16. 15 12月, 2009 1 次提交
    • C
      xfs: event tracing support · 0b1b213f
      Christoph Hellwig 提交于
      Convert the old xfs tracing support that could only be used with the
      out of tree kdb and xfsidbg patches to use the generic event tracer.
      
      To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
      all xfs trace channels by:
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/enable
      
      or alternatively enable single events by just doing the same in one
      event subdirectory, e.g.
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable
      
      or set more complex filters, etc. In Documentation/trace/events.txt
      all this is desctribed in more detail.  To reads the events do a
      
         cat /sys/kernel/debug/tracing/trace
      
      Compared to the last posting this patch converts the tracing mostly to
      the one tracepoint per callsite model that other users of the new
      tracing facility also employ.  This allows a very fine-grained control
      of the tracing, a cleaner output of the traces and also enables the
      perf tool to use each tracepoint as a virtual performance counter,
           allowing us to e.g. count how often certain workloads git various
           spots in XFS.  Take a look at
      
          http://lwn.net/Articles/346470/
      
      for some examples.
      
      Also the btree tracing isn't included at all yet, as it will require
      additional core tracing features not in mainline yet, I plan to
      deliver it later.
      
      And the really nice thing about this patch is that it actually removes
      many lines of code while adding this nice functionality:
      
       fs/xfs/Makefile                |    8
       fs/xfs/linux-2.6/xfs_acl.c     |    1
       fs/xfs/linux-2.6/xfs_aops.c    |   52 -
       fs/xfs/linux-2.6/xfs_aops.h    |    2
       fs/xfs/linux-2.6/xfs_buf.c     |  117 +--
       fs/xfs/linux-2.6/xfs_buf.h     |   33
       fs/xfs/linux-2.6/xfs_fs_subr.c |    3
       fs/xfs/linux-2.6/xfs_ioctl.c   |    1
       fs/xfs/linux-2.6/xfs_ioctl32.c |    1
       fs/xfs/linux-2.6/xfs_iops.c    |    1
       fs/xfs/linux-2.6/xfs_linux.h   |    1
       fs/xfs/linux-2.6/xfs_lrw.c     |   87 --
       fs/xfs/linux-2.6/xfs_lrw.h     |   45 -
       fs/xfs/linux-2.6/xfs_super.c   |  104 ---
       fs/xfs/linux-2.6/xfs_super.h   |    7
       fs/xfs/linux-2.6/xfs_sync.c    |    1
       fs/xfs/linux-2.6/xfs_trace.c   |   75 ++
       fs/xfs/linux-2.6/xfs_trace.h   | 1369 +++++++++++++++++++++++++++++++++++++++++
       fs/xfs/linux-2.6/xfs_vnode.h   |    4
       fs/xfs/quota/xfs_dquot.c       |  110 ---
       fs/xfs/quota/xfs_dquot.h       |   21
       fs/xfs/quota/xfs_qm.c          |   40 -
       fs/xfs/quota/xfs_qm_syscalls.c |    4
       fs/xfs/support/ktrace.c        |  323 ---------
       fs/xfs/support/ktrace.h        |   85 --
       fs/xfs/xfs.h                   |   16
       fs/xfs/xfs_ag.h                |   14
       fs/xfs/xfs_alloc.c             |  230 +-----
       fs/xfs/xfs_alloc.h             |   27
       fs/xfs/xfs_alloc_btree.c       |    1
       fs/xfs/xfs_attr.c              |  107 ---
       fs/xfs/xfs_attr.h              |   10
       fs/xfs/xfs_attr_leaf.c         |   14
       fs/xfs/xfs_attr_sf.h           |   40 -
       fs/xfs/xfs_bmap.c              |  507 +++------------
       fs/xfs/xfs_bmap.h              |   49 -
       fs/xfs/xfs_bmap_btree.c        |    6
       fs/xfs/xfs_btree.c             |    5
       fs/xfs/xfs_btree_trace.h       |   17
       fs/xfs/xfs_buf_item.c          |   87 --
       fs/xfs/xfs_buf_item.h          |   20
       fs/xfs/xfs_da_btree.c          |    3
       fs/xfs/xfs_da_btree.h          |    7
       fs/xfs/xfs_dfrag.c             |    2
       fs/xfs/xfs_dir2.c              |    8
       fs/xfs/xfs_dir2_block.c        |   20
       fs/xfs/xfs_dir2_leaf.c         |   21
       fs/xfs/xfs_dir2_node.c         |   27
       fs/xfs/xfs_dir2_sf.c           |   26
       fs/xfs/xfs_dir2_trace.c        |  216 ------
       fs/xfs/xfs_dir2_trace.h        |   72 --
       fs/xfs/xfs_filestream.c        |    8
       fs/xfs/xfs_fsops.c             |    2
       fs/xfs/xfs_iget.c              |  111 ---
       fs/xfs/xfs_inode.c             |   67 --
       fs/xfs/xfs_inode.h             |   76 --
       fs/xfs/xfs_inode_item.c        |    5
       fs/xfs/xfs_iomap.c             |   85 --
       fs/xfs/xfs_iomap.h             |    8
       fs/xfs/xfs_log.c               |  181 +----
       fs/xfs/xfs_log_priv.h          |   20
       fs/xfs/xfs_log_recover.c       |    1
       fs/xfs/xfs_mount.c             |    2
       fs/xfs/xfs_quota.h             |    8
       fs/xfs/xfs_rename.c            |    1
       fs/xfs/xfs_rtalloc.c           |    1
       fs/xfs/xfs_rw.c                |    3
       fs/xfs/xfs_trans.h             |   47 +
       fs/xfs/xfs_trans_buf.c         |   62 -
       fs/xfs/xfs_vnodeops.c          |    8
       70 files changed, 2151 insertions(+), 2592 deletions(-)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      0b1b213f
  17. 12 12月, 2009 2 次提交
    • C
      xfs: remove incorrect sparse annotation for xfs_iget_cache_miss · 0c3dc2b0
      Christoph Hellwig 提交于
      xfs_iget_cache_miss does not get called with the pag_ici_lock held, so
      the __releases annotation is incorrect.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      0c3dc2b0
    • C
      xfs: reset the i_iolock lock class in the reclaim path · 033da48f
      Christoph Hellwig 提交于
      The iolock is used for protecting reads, writes and block truncates
      against each other.  We have two classes of callers, the first one is
      induced by a file operation and requires a reference to the inode be
      held and not dropped after the operation is done:
      
       - xfs_vm_vmap, xfs_vn_fallocate, xfs_read, xfs_write, xfs_splice_read,
         xfs_splice_write and xfs_setattr are all implementations of VFS
         methods that require a live inode
       - xfs_getbmap and xfs_swap_extents are ioctl subcommand for which the
         same is true
       - xfs_truncate_file is only called on quota inodes just returned from
         xfs_iget
       - xfs_sync_inode_data does the lock just after an igrab()
       - xfs_filestream_associate and xfs_filestream_new_ag take the iolock
         on the parent inode of an inode which by VFS rules must be referenced
      
      And we have various calls to truncate blocks past EOF or the whole
      file when dropping the last reference to an inode.  Unfortunately
      lockdep complains when we do memory allocations that can recurse into
      the filesystem in the first class because the second class happens to
      take the same lock.  To avoid this re-init the iolock in the beginning
      of xfs_fs_clear_inode to get a new lock class.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      033da48f
  18. 02 9月, 2009 2 次提交
    • C
      xfs: simplify xfs_trans_iget · aa72a5cf
      Christoph Hellwig 提交于
      xfs_trans_iget is a wrapper for xfs_iget that adds the inode to the
      transaction after it is read.  Except when the inode already is in the
      inode cache, in which case it returns the existing locked inode with
      increment lock recursion counts.
      
      Now, no one in the tree every decrements these lock recursion counts,
      so any user of this gets a potential double unlock when both the original
      owner of the inode and the xfs_trans_iget caller unlock it.  When looking
      back in a git bisect in the historic XFS tree there was only one place
      that decremented these counts, xfs_trans_iput.  Introduced in commit
      ca25df7a840f426eb566d52667b6950b92bb84b5 by Adam Sweeney in 1993,
      and removed in commit 19f899a3ab155ff6a49c0c79b06f2f61059afaf3 by
      Steve Lord in 2003.  And as long as it didn't slip through git bisects
      cracks never actually used in that time frame.
      
      A quick audit of the callers of xfs_trans_iget shows that no caller
      really relies on this behaviour fortunately - xfs_ialloc allows this
      inode from disk so it must not be there before, and all the RT allocator
      routines only every add each RT bitmap inode once.
      
      In addition to removing lots of code and reducing the size of the inode
      item this patch also avoids the double inode cache lookup in each
      create/mkdir/mknod transaction.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NFelix Blyakher <felixb@sgi.com>
      aa72a5cf
    • C
      xfs: merge fsync and O_SYNC handling · 13e6d5cd
      Christoph Hellwig 提交于
      The guarantees for O_SYNC are exactly the same as the ones we need to
      make for an fsync call (and given that Linux O_SYNC is O_DSYNC the
      equivalent is fdadatasync, but we treat both the same in XFS), except
      with a range data writeout.  Jan Kara has started unifying these two
      path for filesystems using the generic helpers, and I've started to
      look at XFS.
      
      The actual transaction commited by xfs_fsync and xfs_write_sync_logforce
      has a different transaction number, but actually is exactly the same.
      We'll only use the fsync transaction going forward.  One major difference
      is that xfs_write_sync_logforce never issues a cache flush unless we
      commit a transaction causing that as a side-effect, which is an obvious
      bug in the O_SYNC handling.  Second all the locking and i_update_size
      vs i_update_core changes from 978b7237
      never made it to xfs_write_sync_logforce, so we add them back.
      
      To make xfs_fsync easily usable from the O_SYNC path, the filemap_fdatawait
      call is moved up to xfs_file_fsync, so that we don't wait on the whole
      file after we already waited for our portion in xfs_write.
      
      We'll also use a plain call to filemap_write_and_wait_range instead
      of the previous sync_page_rang which did it in two steps including
      an half-hearted inode write out that doesn't help us.
      
      Once we're done with this also remove the now useless i_update_size
      tracking.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NFelix Blyakher <felixb@sgi.com>
      Signed-off-by: NFelix Blyakher <felixb@sgi.com>
      13e6d5cd
  19. 18 8月, 2009 1 次提交
    • C
      xfs: fix locking in xfs_iget_cache_hit · a022fe09
      Christoph Hellwig 提交于
      The locking in xfs_iget_cache_hit currently has numerous problems:
      
       - we clear the reclaim tag without i_flags_lock which protects
         modifications to it
       - we call inode_init_always which can sleep with pag_ici_lock
         held (this is oss.sgi.com BZ #819)
       - we acquire and drop i_flags_lock a lot and thus provide no
         consistency between the various flags we set/clear under it
      
      This patch fixes all that with a major revamp of the locking in
      the function.  The new version acquires i_flags_lock early and
      only drops it once we need to call into inode_init_always or before
      calling xfs_ilock.
      
      This patch fixes a bug seen in the wild where we race modifying the
      reclaim tag.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NFelix Blyakher <felixb@sgi.com>
      Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: NFelix Blyakher <felixb@sgi.com>
      a022fe09
  20. 17 8月, 2009 1 次提交
    • C
      xfs: fix locking in xfs_iget_cache_hit · bc990f5c
      Christoph Hellwig 提交于
      The locking in xfs_iget_cache_hit currently has numerous problems:
      
       - we clear the reclaim tag without i_flags_lock which protects
         modifications to it
       - we call inode_init_always which can sleep with pag_ici_lock
         held (this is oss.sgi.com BZ #819)
       - we acquire and drop i_flags_lock a lot and thus provide no
         consistency between the various flags we set/clear under it
      
      This patch fixes all that with a major revamp of the locking in
      the function.  The new version acquires i_flags_lock early and
      only drops it once we need to call into inode_init_always or before
      calling xfs_ilock.
      
      This patch fixes a bug seen in the wild where we race modifying the
      reclaim tag.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NFelix Blyakher <felixb@sgi.com>
      Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: NFelix Blyakher <felixb@sgi.com>
      bc990f5c
  21. 08 8月, 2009 2 次提交
  22. 24 6月, 2009 1 次提交
  23. 10 6月, 2009 1 次提交
  24. 08 6月, 2009 1 次提交
    • C
      xfs: kill xfs_qmops · 7d095257
      Christoph Hellwig 提交于
      Kill the quota ops function vector and replace it with direct calls or
      stubs in the CONFIG_XFS_QUOTA=n case.
      
      Make sure we check XFS_IS_QUOTA_RUNNING in the right spots.  We can remove
      the number of those checks because the XFS_TRANS_DQ_DIRTY flag can't be set
      otherwise.
      
      This brings us back closer to the way this code worked in IRIX and earlier
      Linux versions, but we keep a lot of the more useful factoring of common
      code.
      
      Eventually we should also kill xfs_qm_bhv.c, but that's left for a later
      patch.
      
      Reduces the size of the source code by about 250 lines and the size of
      XFS module by about 1.5 kilobytes with quotas enabled:
      
         text	   data	    bss	    dec	    hex	filename
       615957	   2960	   3848	 622765	  980ad	fs/xfs/xfs.o
       617231	   3152	   3848	 624231	  98667	fs/xfs/xfs.o.old
      
      Fallout:
      
       - xfs_qm_dqattach is split into xfs_qm_dqattach_locked which expects
         the inode locked and xfs_qm_dqattach which does the locking around it,
         thus removing XFS_QMOPT_ILOCKED.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
      7d095257
  25. 07 4月, 2009 1 次提交
    • D
      xfs: fix double free of inode · 705db3fd
      Dave Chinner 提交于
      If we fail to initialise the VFS inode in inode_init_always(),
      it will call ->delete_inode internally resulting in the inode being
      freed. Hence we need to delay the call to inode_init_always()
      until after the XFS inode is sufficient set up to handle a
      call to ->delete_inode, and then if that fails do not touch
      the inode again at all as it has been freed.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      705db3fd
  26. 07 3月, 2009 1 次提交
  27. 04 3月, 2009 1 次提交
  28. 04 12月, 2008 3 次提交