1. 24 12月, 2011 2 次提交
    • C
      xfs: log all dirty inodes in xfs_fs_sync_fs · be4f1ac8
      Christoph Hellwig 提交于
      Since Linux 2.6.36 the writeback code has introduces various measures for
      live lock prevention during sync().  Unfortunately some of these are
      actively harmful for the XFS model, where the inode gets marked dirty for
      metadata from the data I/O handler.
      
      The older_than_this checks that are now more strictly enforced since
      
          writeback: avoid livelocking WB_SYNC_ALL writeback
      
      by only calling into __writeback_inodes_sb and thus only sampling the
      current cut off time once.  But on a slow enough devices the previous
      asynchronous sync pass might not have fully completed yet, and thus XFS
      might mark metadata dirty only after that sampling of the cut off time for
      the blocking pass already happened.  I have not myself reproduced this
      myself on a real system, but by introducing artificial delay into the
      XFS I/O completion workqueues it can be reproduced easily.
      
      Fix this by iterating over all XFS inodes in ->sync_fs and log all that
      are dirty.  This might log inode that only got redirtied after the
      previous pass, but given how cheap delayed logging of inodes is it
      isn't a major concern for performance.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      be4f1ac8
    • C
      xfs: log the inode in ->write_inode calls for kupdate · 0b8fd303
      Christoph Hellwig 提交于
      If the writeback code writes back an inode because it has expired we currently
      use the non-blockin ->write_inode path.  This means any inode that is pinned
      is skipped.  With delayed logging and a workload that has very little log
      traffic otherwise it is very likely that an inode that gets constantly
      written to is always pinned, and thus we keep refusing to write it.  The VM
      writeback code at that point redirties it and doesn't try to write it again
      for another 30 seconds.  This means under certain scenarious time based
      metadata writeback never happens.
      
      Fix this by calling into xfs_log_inode for kupdate in addition to data
      integrity syncs, and thus transfer the inode to the log ASAP.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      0b8fd303
  2. 07 12月, 2011 2 次提交
    • C
      xfs: fix the logspace waiting algorithm · 9f9c19ec
      Christoph Hellwig 提交于
      Apply the scheme used in log_regrant_write_log_space to wake up any other
      threads waiting for log space before the newly added one to
      log_regrant_write_log_space as well, and factor the code into readable
      helpers.  For each of the queues we have add two helpers:
      
       - one to try to wake up all waiting threads.  This helper will also be
         usable by xfs_log_move_tail once we remove the current opportunistic
         wakeups in it.
       - one to sleep on t_wait until enough log space is available, loosely
         modelled after Linux waitqueues.
       
      And use them to reimplement the guts of log_regrant_write_log_space and
      log_regrant_write_log_space.  These two function now use one and the same
      algorithm for waiting on log space instead of subtly different ones before,
      with an option to completely unify them in the near future.
      
      Also move the filesystem shutdown handling to the common caller given
      that we had to touch it anyway.
      
      Based on hard debugging and an earlier patch from
      Chandra Seetharaman <sekharan@us.ibm.com>.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Tested-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      9f9c19ec
    • C
      xfs: fix nfs export of 64-bit inodes numbers on 32-bit kernels · c29f7d45
      Christoph Hellwig 提交于
      The i_ino field in the VFS inode is of type unsigned long and thus can't
      hold the full 64-bit inode number on 32-bit kernels.  We have the full
      inode number in the XFS inode, so use that one for nfs exports.  Note
      that I've also switched the 32-bit file handles types to it, just to make
      the code more consistent and copy & paste errors less likely to happen.
      Reported-by: NGuoquan Yang <ygq51@hotmail.com>
      Reported-by: NHank Peng <pengxihan@gmail.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c29f7d45
  3. 03 12月, 2011 1 次提交
    • D
      xfs: fix allocation length overflow in xfs_bmapi_write() · a99ebf43
      Dave Chinner 提交于
      When testing the new xfstests --large-fs option that does very large
      file preallocations, this assert was tripped deep in
      xfs_alloc_vextent():
      
      XFS: Assertion failed: args->minlen <= args->maxlen, file: fs/xfs/xfs_alloc.c, line: 2239
      
      The allocation was trying to allocate a zero length extent because
      the lower 32 bits of the allocation length was zero. The remaining
      length of the allocation to be done was an exact multiple of 2^32 -
      the first case I saw was at 496TB remaining to be allocated.
      
      This turns out to be an overflow when converting the allocation
      length (a 64 bit quantity) into the extent length to allocate (a 32
      bit quantity), and it requires the length to be allocated an exact
      multiple of 2^32 blocks to trip the assert.
      
      Fix it by limiting the extent lenth to allocate to MAXEXTLEN.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      a99ebf43
  4. 30 11月, 2011 2 次提交
    • C
      xfs: fix attr2 vs large data fork assert · 4c393a60
      Christoph Hellwig 提交于
      With Dmitry fsstress updates I've seen very reproducible crashes in
      xfs_attr_shortform_remove because xfs_attr_shortform_bytesfit claims that
      the attributes would not fit inline into the inode after removing an
      attribute.  It turns out that we were operating on an inode with lots
      of delalloc extents, and thus an if_bytes values for the data fork that
      is larger than biggest possible on-disk storage for it which utterly
      confuses the code near the end of xfs_attr_shortform_bytesfit.
      
      Fix this by always allowing the current attribute fork, like we already
      do for the attr1 format, given that delalloc conversion will take care
      for moving either the data or attribute area out of line if it doesn't
      fit at that point - or making the point moot by merging extents at this
      point.
      
      Also document the function better, and clean up some loose bits.
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4c393a60
    • C
      xfs: force buffer writeback before blocking on the ilock in inode reclaim · 4dd2cb4a
      Christoph Hellwig 提交于
      If we are doing synchronous inode reclaim we block the VM from making
      progress in memory reclaim.  So if we encouter a flush locked inode
      promote it in the delwri list and wake up xfsbufd to write it out now.
      Without this we can get hangs of up to 30 seconds during workloads hitting
      synchronous inode reclaim.
      
      The scheme is copied from what we do for dquot reclaims.
      Reported-by: NSimon Kirby <sim@hostway.ca>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NSimon Kirby <sim@hostway.ca>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4dd2cb4a
  5. 29 11月, 2011 1 次提交
  6. 19 11月, 2011 1 次提交
    • A
      MAINTAINERS: update XFS maintainer entry · c8891329
      Alex Elder 提交于
      I will no longer be maintaining XFS for SGI.  Ben Myers
      (bpm@sgi.com) has agreed to be the primary maintainer
      for XFS in my place.  I will continue to be able to push
      commits to the SGI XFS tree if required.  As such I will
      continue to be a designated XFS maintainer, but plan to
      serve in more of a backup role.
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c8891329
  7. 16 11月, 2011 1 次提交
    • M
      xfs: use doalloc flag in xfs_qm_dqattach_one() · db3e74b5
      Mitsuo Hayasaka 提交于
      The doalloc arg in xfs_qm_dqattach_one() is a flag that indicates
      whether a new area to handle quota information will be allocated
      if needed. Originally, it was passed to xfs_qm_dqget(), but has
      been removed by the following commit (probably by mistake):
      
      	commit 8e9b6e7f
      	Author: Christoph Hellwig <hch@lst.de>
      	Date:   Sun Feb 8 21:51:42 2009 +0100
      
      	xfs: remove the unused XFS_QMOPT_DQLOCK flag
      
      As the result, xfs_qm_dqget() called from xfs_qm_dqattach_one()
      never allocates the new area even if it is needed.
      
      This patch gives the doalloc arg to xfs_qm_dqget() in
      xfs_qm_dqattach_one() to fix this problem.
      Signed-off-by: NMitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      db3e74b5
  8. 09 11月, 2011 3 次提交
  9. 08 11月, 2011 27 次提交