1. 15 5月, 2012 14 次提交
    • C
      xfs: do not write the buffer from xfs_qm_dqflush · fe7257fd
      Christoph Hellwig 提交于
      Instead of writing the buffer directly from inside xfs_qm_dqflush return it
      to the caller and let the caller decide what to do with the buffer.  Also
      remove the pincount check in xfs_qm_dqflush that all non-blocking callers
      already implement and the now unused flags parameter and the XFS_DQ_IS_DIRTY
      check that all callers already perform.
      
      [ Dave Chinner: fixed build error cause by missing '{'. ]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      fe7257fd
    • C
      xfs: do not write the buffer from xfs_iflush · 4c46819a
      Christoph Hellwig 提交于
      Instead of writing the buffer directly from inside xfs_iflush return it to
      the caller and let the caller decide what to do with the buffer.  Also
      remove the pincount check in xfs_iflush that all non-blocking callers already
      implement and the now unused flags parameter.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4c46819a
    • C
      xfs: don't flush inodes from background inode reclaim · 8a48088f
      Christoph Hellwig 提交于
      We already flush dirty inodes throug the AIL regularly, there is no reason
      to have second thread compete with it and disturb the I/O pattern.  We still
      do write inodes when doing a synchronous reclaim from the shrinker or during
      unmount for now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      8a48088f
    • C
      xfs: implement freezing by emptying the AIL · 211e4d43
      Christoph Hellwig 提交于
      Now that we write back all metadata either synchronously or through
      the AIL we can simply implement metadata freezing in terms of
      emptying the AIL.
      
      The implementation for this is fairly simply and straight-forward:
      A new routine is added that asks the xfsaild to push the AIL to the
      end and waits for it to complete and send a wakeup. The routine will
      then loop if the AIL is not actually empty, and continue to do so
      until the AIL is compeltely empty.
      
      We keep an inode reclaim pass in the freeze process to avoid having
      memory pressure have to reclaim inodes that require dirtying the
      filesystem to be reclaimed after the freeze has completed. This
      means we can also treat unmount in the exact same way as freeze.
      
      As an upside we can now remove the radix tree based inode writeback
      and xfs_unmountfs_writesb.
      
      [ Dave Chinner:
      	- Cleaned up commit message.
      	- Added inode reclaim passes back into freeze.
      	- Cleaned up wakeup mechanism to avoid the use of a new
      	  sleep counter variable. ]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      211e4d43
    • C
      xfs: allow assigning the tail lsn with the AIL lock held · 1c304625
      Christoph Hellwig 提交于
      Provide a variant of xlog_assign_tail_lsn that has the AIL lock already
      held.  By doing so we do an additional atomic_read + atomic_set under
      the lock, which comes down to two instructions.
      
      Switch xfs_trans_ail_update_bulk and xfs_trans_ail_delete_bulk to the
      new version to reduce the number of lock roundtrips, and prepare for
      a new addition that would require a third lock roundtrip in
      xfs_trans_ail_delete_bulk.  This addition is also the reason for
      slightly rearranging the conditionals and relying on xfs_log_space_wake
      for checking that the filesystem has been shut down internally.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1c304625
    • C
      xfs: remove log item from AIL in xfs_iflush after a shutdown · 32ce90a4
      Christoph Hellwig 提交于
      If a filesystem has been forced shutdown we are never going to write inodes
      to disk, which means the inode items will stay in the AIL until we free
      the inode. Currently that is not a problem, but a pending change requires us
      to empty the AIL before shutting down the filesystem. In that case leaving
      the inode in the AIL is lethal. Make sure to remove the log item from the AIL
      to allow emptying the AIL on shutdown filesystems.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      32ce90a4
    • C
      xfs: remove log item from AIL in xfs_qm_dqflush after a shutdown · dea96095
      Christoph Hellwig 提交于
      If a filesystem has been forced shutdown we are never going to write dquots
      to disk, which means the dquot items will stay in the AIL forever.
      Currently that is not a problem, but a pending chance requires us to
      empty the AIL before shutting down the filesystem, in which case this
      behaviour is lethal.  Make sure to remove the log item from the AIL
      to allow emptying the AIL on shutdown filesystems.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      dea96095
    • S
      xfs: using GFP_NOFS for blkdev_issue_flush · 7582df51
      Shaohua Li 提交于
      Issuing a block device flush request in transaction context using GFP_KERNEL
      directly can cause deadlocks due to memory reclaim recursion. Use GFP_NOFS to
      avoid recursion from reclaim context.
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      7582df51
    • D
      xfs: punch all delalloc blocks beyond EOF on write failure. · 01c84d2d
      Dave Chinner 提交于
      I've been seeing regular ASSERT failures in xfstests when running
      fsstress based tests over the past month. xfs_getbmap() has been
      failing this test:
      
      XFS: Assertion failed: ((iflags & BMV_IF_DELALLOC) != 0) ||
      (map[i].br_startblock != DELAYSTARTBLOCK), file: fs/xfs/xfs_bmap.c,
      line: 5650
      
      where it is encountering a delayed allocation extent after writing
      all the dirty data to disk and then walking the extent map
      atomically by holding the XFS_IOLOCK_SHARED to prevent new delayed
      allocation extents from being created.
      
      Test 083 on a 512 byte block size filesystem was used to reproduce
      the problem, because it only had a 5s run timeand would usually fail
      every 3-4 runs. This test is exercising ENOSPC behaviour by running
      fsstress on a nearly full filesystem. The following trace extract
      shows the final few events on the inode that tripped the assert:
      
       xfs_ilock:             flags ILOCK_EXCL caller xfs_setfilesize
       xfs_setfilesize:       isize 0x180000 disize 0x12d400 offset 0x17e200 count 7680
      
      file size updated to 0x180000 by IO completion
      
       xfs_ilock:             flags ILOCK_EXCL caller xfs_iomap_write_delay
       xfs_iext_insert:       state  idx 3 offset 3072 block 4503599627239432 count 1 flag 0 caller xfs_bmap_add_extent_hole_delay
       xfs_get_blocks_alloc:  size 0x180000 offset 0x180000 count 512 type  startoff 0xc00 startblock -1 blockcount 0x1
       xfs_ilock:             flags ILOCK_EXCL caller __xfs_get_blocks
      
      delalloc write, adding a single block at offset 0x180000
      
       xfs_delalloc_enospc:   isize 0x180000 disize 0x180000 offset 0x180200 count 512
      
      ENOSPC trying to allocate a dellalloc block at offset 0x180200
      
       xfs_ilock:             flags ILOCK_EXCL caller xfs_iomap_write_delay
       xfs_get_blocks_alloc:  size 0x180000 offset 0x180200 count 512 type  startoff 0xc00 startblock -1 blockcount 0x2
      
      And succeeding on retry after flushing dirty inodes.
      
       xfs_ilock:             flags ILOCK_EXCL caller __xfs_get_blocks
       xfs_delalloc_enospc:   isize 0x180000 disize 0x180000 offset 0x180400 count 512
      
      ENOSPC trying to allocate a dellalloc block at offset 0x180400
      
       xfs_ilock:             flags ILOCK_EXCL caller xfs_iomap_write_delay
       xfs_delalloc_enospc:   isize 0x180000 disize 0x180000 offset 0x180400 count 512
      
      And failing the retry, giving a real ENOSPC error.
      
       xfs_ilock:             flags ILOCK_EXCL caller xfs_vm_write_failed
                                                      ^^^^^^^^^^^^^^^^^^^
      The smoking gun - the write being failed and cleaning up delalloc
      blocks beyond EOF allocated by the failed write.
      
       xfs_getattr:
       xfs_ilock:             flags IOLOCK_SHARED caller xfs_getbmap
       xfs_ilock:             flags ILOCK_SHARED caller xfs_ilock_map_shared
      
      And that's where we died almost immediately afterwards.
      xfs_bmapi_read() found delalloc extent beyond current file in memory
      file size. Some debug I added to xfs_getbmap() showed the state just
      before the assert failure:
      
       ino 0x80e48: off 0xc00, fsb 0xffffffffffffffff, len 0x1, size 0x180000
       start_fsb 0x106, end_fsb 0x638
       ino flags 0x2 nex 0xd bmvcnt 0x555, len 0x3c58a6f23c0bf1, start 0xc00
       ext 0: off 0x1fc, fsb 0x24782, len 0x254
       ext 1: off 0x450, fsb 0x40851, len 0x30
       ext 2: off 0x480, fsb 0xd99, len 0x1b8
       ext 3: off 0x92f, fsb 0x4099a, len 0x3b
       ext 4: off 0x96d, fsb 0x41844, len 0x98
       ext 5: off 0xbf1, fsb 0x408ab, len 0xf
      
      which shows that we found a single delalloc block beyond EOF (first
      line of output) when we were returning the map for a length
      somewhere around 10^16 bytes long (second line), and the on-disk
      extents showed they didn't go past EOF (last lines).
      
      Further debug added to xfs_vm_write_failed() showed this happened
      when punching out delalloc blocks beyond the end of the file after
      the failed write:
      
      [  132.606693] ino 0x80e48: vwf to 0x181000, sze 0x180000
      [  132.609573] start_fsb 0xc01, end_fsb 0xc08
      
      It punched the range 0xc01 -> 0xc08, but the range we really need to
      punch is 0xc00 -> 0xc07 (8 blocks from 0xc00) as this testing was
      run on a 512 byte block size filesystem (8 blocks per page).
      the punch from is 0xc00. So end_fsb is correct, but start_fsb is
      wrong as we punch from start_fsb for (end_fsb - start_fsb) blocks.
      Hence we are not punching the delalloc block beyond EOF in the case.
      
      The fix is simple - it's a silly off-by-one mistake in calculating
      the range. It's especially silly because the macro used to calculate
      the start_fsb already takes into account the case where the inode
      size is an exact multiple of the filesystem block size...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      01c84d2d
    • D
      xfs: use shared ilock mode for direct IO writes by default · 507630b2
      Dave Chinner 提交于
      For the direct IO write path, we only really need the ilock to be taken in
      exclusive mode during IO submission if we need to do extent allocation
      instead of all the time.
      
      Change the block mapping code to take the ilock in shared mode for the
      initial block mapping, and only retake it exclusively when we actually
      have to perform extent allocations.  We were already dropping the ilock
      for the transaction allocation, so this doesn't introduce new race windows.
      
      Based on an earlier patch from Dave Chinner.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      507630b2
    • C
      xfs: push the ilock into xfs_zero_eof · 193aec10
      Christoph Hellwig 提交于
      Instead of calling xfs_zero_eof with the ilock held only take it internally
      for the minimall required critical section around xfs_bmapi_read.  This
      also requires changing the calling convention for xfs_zero_last_block
      slightly.  The actual zeroing operation is still serialized by the iolock,
      which must be taken exclusively over the call to xfs_zero_eof.
      
      We could in fact use a shared lock for the xfs_bmapi_read calls as long as
      the extent list has been read in, but given that we already hold the iolock
      exclusively there is little reason to micro optimize this further.
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      193aec10
    • C
      xfs: reduce ilock hold times in xfs_setattr_size · f38996f5
      Christoph Hellwig 提交于
      We do not need the ilock for most checks done in the beginning of
      xfs_setattr_size.  Replace the long critical section before starting the
      transaction with a smaller one around xfs_zero_eof and an optional one
      inside xfs_qm_dqattach that isn't entered unless using quotas.  While
      this isn't a big optimization for xfs_setattr_size itself it will allow
      pushing the ilock into xfs_zero_eof itself later.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f38996f5
    • C
      xfs: reduce ilock hold times in xfs_file_aio_write_checks · 467f7899
      Christoph Hellwig 提交于
      We do not need the ilock for generic_write_checks and the i_size_read,
      which are protected by i_mutex and/or iolock, so reduce the ilock
      critical section to just the call to xfs_zero_eof.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      467f7899
    • C
      xfs: avoid taking the ilock unnessecarily in xfs_qm_dqattach · b4d05e30
      Christoph Hellwig 提交于
      Check if we actually need to attach a dquot before taking the ilock in
      xfs_qm_dqattach.  This avoid superflous lock roundtrips for the common cases
      of quota support compiled in but not activated on a filesystem and an
      inode that already has the dquots attached.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b4d05e30
  2. 18 4月, 2012 1 次提交
    • D
      xfs: Ensure inode reclaim can run during quotacheck · 8a00ebe4
      Dave Chinner 提交于
      Because the mount process can run a quotacheck and consume lots of
      inodes, we need to be able to run periodic inode reclaim during the
      mount process. This will prevent running the system out of memory
      during quota checks.
      
      This essentially reverts 2bcf6e97, but that is safe to do now that
      the quota sync code that was causing problems during long quotacheck
      executions is now gone.
      
      The reclaim work is currently protected from running during the
      unmount process by a check against MS_ACTIVE. Unfortunately, this
      also means that the reclaim work cannot run during mount.  The
      unmount process should stop the reclaim cleanly before freeing
      anything that the reclaim work depends on, so there is no need to
      have this guard in place.
      
      Also, the inode reclaim work is demand driven, so there is no need
      to start it immediately during mount. It will be started the moment
      an inode is queued for reclaim, so qutoacheck will trigger it just
      fine.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      8a00ebe4
  3. 17 4月, 2012 1 次提交
  4. 07 4月, 2012 1 次提交
    • L
      Make the "word-at-a-time" helper functions more commonly usable · f68e556e
      Linus Torvalds 提交于
      I have a new optimized x86 "strncpy_from_user()" that will use these
      same helper functions for all the same reasons the name lookup code uses
      them.  This is preparation for that.
      
      This moves them into an architecture-specific header file.  It's
      architecture-specific for two reasons:
      
       - some of the functions are likely to want architecture-specific
         implementations.  Even if the current code happens to be "generic" in
         the sense that it should work on any little-endian machine, it's
         likely that the "multiply by a big constant and shift" implementation
         is less than optimal for an architecture that has a guaranteed fast
         bit count instruction, for example.
      
       - I expect that if architectures like sparc want to start playing
         around with this, we'll need to abstract out a few more details (in
         particular the actual unaligned accesses).  So we're likely to have
         more architecture-specific stuff if non-x86 architectures start using
         this.
      
         (and if it turns out that non-x86 architectures don't start using
         this, then having it in an architecture-specific header is still the
         right thing to do, of course)
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f68e556e
  5. 06 4月, 2012 8 次提交
  6. 04 4月, 2012 2 次提交
  7. 02 4月, 2012 2 次提交
  8. 01 4月, 2012 11 次提交
新手
引导
客服 返回
顶部