1. 18 10月, 2012 10 次提交
    • D
      xfs: remove xfs_iget.c · 33479e05
      Dave Chinner 提交于
      The inode cache functions remaining in xfs_iget.c can be moved to xfs_icache.c
      along with the other inode cache functions. This removes all functionality from
      xfs_iget.c, so the file can simply be removed.
      
      This move results in various functions now only having the scope of a single
      file (e.g. xfs_inode_free()), so clean up all the definitions and exported
      prototypes in xfs_icache.[ch] and xfs_inode.h appropriately.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      33479e05
    • D
      xfs: rename xfs_sync.[ch] to xfs_icache.[ch] · 6d8b79cf
      Dave Chinner 提交于
      xfs_sync.c now only contains inode reclaim functions and inode cache
      iteration functions. It is not related to sync operations anymore.
      Rename to xfs_icache.c to reflect it's contents and prepare for
      consolidation with the other inode cache file that exists
      (xfs_iget.c).
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      6d8b79cf
    • D
      xfs: move xfs_quiesce_attr() into xfs_super.c · c7eea6f7
      Dave Chinner 提交于
      Both callers of xfs_quiesce_attr() are in xfs_super.c, and there's
      nothing really sync-specific about this functionality so it doesn't
      really matter where it lives. Move it to benext to it's callers, so
      all the remount/sync_fs code is in the one place.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c7eea6f7
    • D
      xfs: xfs_sync_fsdata is redundant · 34061f5c
      Dave Chinner 提交于
      Why do we need to write the superblock to disk once we've written
      all the data?  We don't actually - the reasons for doing this are
      lost in the mists of time, and go back to the way Irix used to drive
      VFS flushing.
      
      On linux, this code is only called from two contexts: remount and
      .sync_fs. In the remount case, the call is followed by a metadata
      sync, which unpins and writes the superblock.  In the sync_fs case,
      we only need to force the log to disk to ensure that the superblock
      is correctly on disk, so we don't actually need to write it. Hence
      the functionality is either redundant or superfluous and thus can be
      removed.
      
      Seeing as xfs_quiesce_data is essentially now just a log force,
      remove it as well and fold the code back into the two callers.
      Neither of them need the log covering check, either, as that is
      redundant for the remount case, and unnecessary for the .sync_fs
      case.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      34061f5c
    • D
      xfs: syncd workqueue is no more · 5889608d
      Dave Chinner 提交于
      With the syncd functions moved to the log and/or removed, the syncd
      workqueue is the only remaining bit left. It is used by the log
      covering/ail pushing work, as well as by the inode reclaim work.
      
      Given how cheap workqueues are these days, give the log and inode
      reclaim work their own work queues and kill the syncd work queue.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5889608d
    • D
      xfs: xfs_sync_data is redundant. · 9aa05000
      Dave Chinner 提交于
      We don't do any data writeback from XFS any more - the VFS is
      completely responsible for that, including for freeze. We can
      replace the remaining caller with a VFS level function that
      achieves the same thing, but without conflicting with current
      writeback work.
      
      This means we can remove the flush_work and xfs_flush_inodes() - the
      VFS functionality completely replaces the internal flush queue for
      doing this writeback work in a separate context to avoid stack
      overruns.
      
      This does have one complication - it cannot be called with page
      locks held.  Hence move the flushing of delalloc space when ENOSPC
      occurs back up into xfs_file_aio_buffered_write when we don't hold
      any locks that will stall writeback.
      
      Unfortunately, writeback_inodes_sb_if_idle() is not sufficient to
      trigger delalloc conversion fast enough to prevent spurious ENOSPC
      whent here are hundreds of writers, thousands of small files and GBs
      of free RAM.  Hence we need to use sync_sb_inodes() to block callers
      while we wait for writeback like the previous xfs_flush_inodes
      implementation did.
      
      That means we have to hold the s_umount lock here, but because this
      call can nest inside i_mutex (the parent directory in the create
      case, held by the VFS), we have to use down_read_trylock() to avoid
      potential deadlocks. In practice, this trylock will succeed on
      almost every attempt as unmount/remount type operations are
      exceedingly rare.
      
      Note: we always need to pass a count of zero to
      generic_file_buffered_write() as the previously written byte count.
      We only do this by accident before this patch by the virtue of ret
      always being zero when there are no errors. Make this explicit
      rather than needing to specifically zero ret in the ENOSPC retry
      case.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      9aa05000
    • D
      xfs: sync work is now only periodic log work · f661f1e0
      Dave Chinner 提交于
      The only thing the periodic sync work does now is flush the AIL and
      idle the log. These are really functions of the log code, so move
      the work to xfs_log.c and rename it appropriately.
      
      The only wart that this leaves behind is the xfssyncd_centisecs
      sysctl, otherwise the xfssyncd is dead. Clean up any comments that
      related to xfssyncd to reflect it's passing.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      f661f1e0
    • D
      xfs: don't run the sync work if the filesystem is read-only · 7f7bebef
      Dave Chinner 提交于
      If the filesystem is mounted or remounted read-only, stop the sync
      worker that tries to flush or cover the log if the filesystem is
      dirty. It's read-only, so it isn't dirty. Restart it on a remount,rw
      as necessary. This avoids the need for RO checks in the work.
      
      Similarly, stop the sync work when the filesystem is frozen, and
      start it again when the filesysetm is thawed. This avoids the need
      for special freeze checks in the work.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      7f7bebef
    • D
      xfs: rationalise xfs_mount_wq users · 7e18530b
      Dave Chinner 提交于
      Instead of starting and stopping background work on the xfs_mount_wq
      all at the same time, separate them to where they really are needed
      to start and stop.
      
      The xfs_sync_worker, only needs to be started after all the mount
      processing has completed successfully, while it needs to be stopped
      before the log is unmounted.
      
      The xfs_reclaim_worker is started on demand, and can be
      stopped before the unmount process does it's own inode reclaim pass.
      
      The xfs_flush_inodes work is run on demand, and so we really only
      need to ensure that it has stopped running before we start
      processing an unmount, freeze or remount,ro.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      7e18530b
    • D
      xfs: xfs_syncd_stop must die · 33c7a2bc
      Dave Chinner 提交于
      xfs_syncd_start and xfs_syncd_stop tie a bunch of unrelated
      functionailty together that actually have different start and stop
      requirements. Kill these functions and open code the start/stop
      methods for each of the background functions.
      
      Subsequent patches will move the start/stop functions around to the
      correct places to avoid races and shutdown issues.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      33c7a2bc
  2. 21 8月, 2012 1 次提交
    • T
      workqueue: deprecate flush[_delayed]_work_sync() · 43829731
      Tejun Heo 提交于
      flush[_delayed]_work_sync() are now spurious.  Mark them deprecated
      and convert all users to flush[_delayed]_work().
      
      If you're cc'd and wondering what's going on: Now all workqueues are
      non-reentrant and the regular flushes guarantee that the work item is
      not pending or running on any CPU on return, so there's no reason to
      use the sync flushes at all and they're going away.
      
      This patch doesn't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Mattia Dongili <malattia@linux.it>
      Cc: Kent Yoder <key@linux.vnet.ibm.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Cc: Bryan Wu <bryan.wu@canonical.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-wireless@vger.kernel.org
      Cc: Anton Vorontsov <cbou@mail.ru>
      Cc: Sangbeom Kim <sbkim73@samsung.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Avi Kivity <avi@redhat.com> 
      43829731
  3. 31 7月, 2012 1 次提交
    • J
      xfs: Convert to new freezing code · d9457dc0
      Jan Kara 提交于
      Generic code now blocks all writers from standard write paths. So we add
      blocking of all writers coming from ioctl (we get a protection of ioctl against
      racing remount read-only as a bonus) and convert xfs_file_aio_write() to a
      non-racy freeze protection. We also keep freeze protection on transaction
      start to block internal filesystem writes such as removal of preallocated
      blocks.
      
      CC: Ben Myers <bpm@sgi.com>
      CC: Alex Elder <elder@kernel.org>
      CC: xfs@oss.sgi.com
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d9457dc0
  4. 30 7月, 2012 1 次提交
    • M
      xfs: wait for the write the superblock on unmount · 9a57fa8e
      Mark Tinguely 提交于
      v2: Add the xfs_buf_lock to xfs_quiesce_attr().
          Add explaination why xfs_buf_lock() is used to wait for write.
      
      xfs_wait_buftarg() does not wait for the completion of the write of the
      uncached superblock. This write can race with the shutdown of the log
      and causes a panic if the write does not win the race.
      
      During the log write, xfsaild_push() will lock the buffer and set the
      XBF_ASYNC flag. Because the XBF_FLAG is set, complete() is not performed
      on the buffer's iowait entry, we cannot call xfs_buf_iowait() to wait
      for the write to complete. The buffer's lock is held until the write is
      complete, so we can block on a xfs_buf_lock() request to be notified
      that the write is complete.
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      9a57fa8e
  5. 22 7月, 2012 1 次提交
  6. 22 6月, 2012 2 次提交
  7. 16 5月, 2012 1 次提交
    • B
      xfs: protect xfs_sync_worker with s_umount semaphore · 1307bbd2
      Ben Myers 提交于
      xfs_sync_worker checks the MS_ACTIVE flag in s_flags to avoid doing
      work during mount and unmount.  This flag can be cleared by unmount
      after the xfs_sync_worker checks it but before the work is completed.
      The has caused crashes in the completion handler for the dummy
      transaction commited by xfs_sync_worker:
      
      PID: 27544  TASK: ffff88013544e040  CPU: 3   COMMAND: "kworker/3:0"
       #0 [ffff88016fdff930] machine_kexec at ffffffff810244e9
       #1 [ffff88016fdff9a0] crash_kexec at ffffffff8108d053
       #2 [ffff88016fdffa70] oops_end at ffffffff813ad1b8
       #3 [ffff88016fdffaa0] no_context at ffffffff8102bd48
       #4 [ffff88016fdffaf0] __bad_area_nosemaphore at ffffffff8102c04d
       #5 [ffff88016fdffb40] bad_area_nosemaphore at ffffffff8102c12e
       #6 [ffff88016fdffb50] do_page_fault at ffffffff813afaee
       #7 [ffff88016fdffc60] page_fault at ffffffff813ac635
          [exception RIP: xlog_get_lowest_lsn+0x30]
          RIP: ffffffffa04a9910  RSP: ffff88016fdffd10  RFLAGS: 00010246
          RAX: ffffc90014e48000  RBX: ffff88014d879980  RCX: ffff88014d879980
          RDX: ffff8802214ee4c0  RSI: 0000000000000000  RDI: 0000000000000000
          RBP: ffff88016fdffd10   R8: ffff88014d879a80   R9: 0000000000000000
          R10: 0000000000000001  R11: 0000000000000000  R12: ffff8802214ee400
          R13: ffff88014d879980  R14: 0000000000000000  R15: ffff88022fd96605
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #8 [ffff88016fdffd18] xlog_state_do_callback at ffffffffa04aa186 [xfs]
       #9 [ffff88016fdffd98] xlog_state_done_syncing at ffffffffa04aa568 [xfs]
      
      Protect xfs_sync_worker by using the s_umount semaphore at the read
      level to provide exclusion with unmount while work is progressing.
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1307bbd2
  8. 15 5月, 2012 7 次提交
    • D
      xfs: clean up xfs_bit.h includes · ad1e95c5
      Dave Chinner 提交于
      With the removal of xfs_rw.h and other changes over time, xfs_bit.h
      is being included in many files that don't actually need it. Clean
      up the includes as necessary.
      
      Also move the only-used-once xfs_ialloc_find_free() static inline
      function out of a header file that is widely included to reduce
      the number of needless dependencies on xfs_bit.h.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ad1e95c5
    • D
      xfs: pass shutdown method into xfs_trans_ail_delete_bulk · 04913fdd
      Dave Chinner 提交于
      xfs_trans_ail_delete_bulk() can be called from different contexts so
      if the item is not in the AIL we need different shutdown for each
      context.  Pass in the shutdown method needed so the correct action
      can be taken.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      04913fdd
    • C
      xfs: on-stack delayed write buffer lists · 43ff2122
      Christoph Hellwig 提交于
      Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
      and write back the buffers per-process instead of by waking up xfsbufd.
      
      This is now easily doable given that we have very few places left that write
      delwri buffers:
      
       - log recovery:
      	Only done at mount time, and already forcing out the buffers
      	synchronously using xfs_flush_buftarg
      
       - quotacheck:
      	Same story.
      
       - dquot reclaim:
      	Writes out dirty dquots on the LRU under memory pressure.  We might
      	want to look into doing more of this via xfsaild, but it's already
      	more optimal than the synchronous inode reclaim that writes each
      	buffer synchronously.
      
       - xfsaild:
      	This is the main beneficiary of the change.  By keeping a local list
      	of buffers to write we reduce latency of writing out buffers, and
      	more importably we can remove all the delwri list promotions which
      	were hitting the buffer cache hard under sustained metadata loads.
      
      The implementation is very straight forward - xfs_buf_delwri_queue now gets
      a new list_head pointer that it adds the delwri buffers to, and all callers
      need to eventually submit the list using xfs_buf_delwi_submit or
      xfs_buf_delwi_submit_nowait.  Buffers that already are on a delwri list are
      skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
      list.  The biggest change to pass down the buffer list was done to the AIL
      pushing. Now that we operate on buffers the trylock, push and pushbuf log
      item methods are merged into a single push routine, which tries to lock the
      item, and if possible add the buffer that needs writeback to the buffer list.
      This leads to much simpler code than the previous split but requires the
      individual IOP_PUSH instances to unlock and reacquire the AIL around calls
      to blocking routines.
      
      Given that xfsailds now also handle writing out buffers, the conditions for
      log forcing and the sleep times needed some small changes.  The most
      important one is that we consider an AIL busy as long we still have buffers
      to push, and the other one is that we do increment the pushed LSN for
      buffers that are under flushing at this moment, but still count them towards
      the stuck items for restart purposes.  Without this we could hammer on stuck
      items without ever forcing the log and not make progress under heavy random
      delete workloads on fast flash storage devices.
      
      [ Dave Chinner:
      	- rebase on previous patches.
      	- improved comments for XBF_DELWRI_Q handling
      	- fix XBF_ASYNC handling in queue submission (test 106 failure)
      	- rename delwri submit function buffer list parameters for clarity
      	- xfs_efd_item_push() should return XFS_ITEM_PINNED ]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      43ff2122
    • C
      xfs: do not write the buffer from xfs_iflush · 4c46819a
      Christoph Hellwig 提交于
      Instead of writing the buffer directly from inside xfs_iflush return it to
      the caller and let the caller decide what to do with the buffer.  Also
      remove the pincount check in xfs_iflush that all non-blocking callers already
      implement and the now unused flags parameter.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4c46819a
    • C
      xfs: don't flush inodes from background inode reclaim · 8a48088f
      Christoph Hellwig 提交于
      We already flush dirty inodes throug the AIL regularly, there is no reason
      to have second thread compete with it and disturb the I/O pattern.  We still
      do write inodes when doing a synchronous reclaim from the shrinker or during
      unmount for now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      8a48088f
    • C
      xfs: implement freezing by emptying the AIL · 211e4d43
      Christoph Hellwig 提交于
      Now that we write back all metadata either synchronously or through
      the AIL we can simply implement metadata freezing in terms of
      emptying the AIL.
      
      The implementation for this is fairly simply and straight-forward:
      A new routine is added that asks the xfsaild to push the AIL to the
      end and waits for it to complete and send a wakeup. The routine will
      then loop if the AIL is not actually empty, and continue to do so
      until the AIL is compeltely empty.
      
      We keep an inode reclaim pass in the freeze process to avoid having
      memory pressure have to reclaim inodes that require dirtying the
      filesystem to be reclaimed after the freeze has completed. This
      means we can also treat unmount in the exact same way as freeze.
      
      As an upside we can now remove the radix tree based inode writeback
      and xfs_unmountfs_writesb.
      
      [ Dave Chinner:
      	- Cleaned up commit message.
      	- Added inode reclaim passes back into freeze.
      	- Cleaned up wakeup mechanism to avoid the use of a new
      	  sleep counter variable. ]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      211e4d43
    • C
      xfs: remove log item from AIL in xfs_iflush after a shutdown · 32ce90a4
      Christoph Hellwig 提交于
      If a filesystem has been forced shutdown we are never going to write inodes
      to disk, which means the inode items will stay in the AIL until we free
      the inode. Currently that is not a problem, but a pending change requires us
      to empty the AIL before shutting down the filesystem. In that case leaving
      the inode in the AIL is lethal. Make sure to remove the log item from the AIL
      to allow emptying the AIL on shutdown filesystems.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      32ce90a4
  9. 18 4月, 2012 1 次提交
    • D
      xfs: Ensure inode reclaim can run during quotacheck · 8a00ebe4
      Dave Chinner 提交于
      Because the mount process can run a quotacheck and consume lots of
      inodes, we need to be able to run periodic inode reclaim during the
      mount process. This will prevent running the system out of memory
      during quota checks.
      
      This essentially reverts 2bcf6e97, but that is safe to do now that
      the quota sync code that was causing problems during long quotacheck
      executions is now gone.
      
      The reclaim work is currently protected from running during the
      unmount process by a check against MS_ACTIVE. Unfortunately, this
      also means that the reclaim work cannot run during mount.  The
      unmount process should stop the reclaim cleanly before freeing
      anything that the reclaim work depends on, so there is no need to
      have this guard in place.
      
      Also, the inode reclaim work is demand driven, so there is no need
      to start it immediately during mount. It will be started the moment
      an inode is queued for reclaim, so qutoacheck will trigger it just
      fine.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      8a00ebe4
  10. 14 3月, 2012 1 次提交
  11. 26 2月, 2012 1 次提交
    • A
      xfs: only take the ILOCK in xfs_reclaim_inode() · ad637a10
      Alex Elder 提交于
      At the end of xfs_reclaim_inode(), the inode is locked in order to
      we wait for a possible concurrent lookup to complete before the
      inode is freed.  This synchronization step was taking both the ILOCK
      and the IOLOCK, but the latter was causing lockdep to produce
      reports of the possibility of deadlock.
      
      It turns out that there's no need to acquire the IOLOCK at this
      point anyway.  It may have been required in some earlier version of
      the code, but there should be no need to take the IOLOCK in
      xfs_iget(), so there's no (longer) any need to get it here for
      synchronization.  Add an assertion in xfs_iget() as a reminder
      of this assumption.
      
      Dave Chinner diagnosed this on IRC, and Christoph Hellwig suggested
      no longer including the IOLOCK.  I just put together the patch.
      Signed-off-by: NAlex Elder <elder@dreamhost.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ad637a10
  12. 18 1月, 2012 1 次提交
    • C
      xfs: replace i_flock with a sleeping bitlock · 474fce06
      Christoph Hellwig 提交于
      We almost never block on i_flock, the exception is synchronous inode
      flushing.  Instead of bloating the inode with a 16/24-byte completion
      that we abuse as a semaphore just implement it as a bitlock that uses
      a bit waitqueue for the rare sleeping path.  This primarily is a
      tradeoff between a much smaller inode and a faster non-blocking
      path vs faster wakeups, and we are much better off with the former.
      
      A small downside is that we will lose lockdep checking for i_flock, but
      given that it's always taken inside the ilock that should be acceptable.
      
      Note that for example the inode writeback locking is implemented in a
      very similar way.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      474fce06
  13. 24 12月, 2011 1 次提交
    • C
      xfs: log all dirty inodes in xfs_fs_sync_fs · be4f1ac8
      Christoph Hellwig 提交于
      Since Linux 2.6.36 the writeback code has introduces various measures for
      live lock prevention during sync().  Unfortunately some of these are
      actively harmful for the XFS model, where the inode gets marked dirty for
      metadata from the data I/O handler.
      
      The older_than_this checks that are now more strictly enforced since
      
          writeback: avoid livelocking WB_SYNC_ALL writeback
      
      by only calling into __writeback_inodes_sb and thus only sampling the
      current cut off time once.  But on a slow enough devices the previous
      asynchronous sync pass might not have fully completed yet, and thus XFS
      might mark metadata dirty only after that sampling of the cut off time for
      the blocking pass already happened.  I have not myself reproduced this
      myself on a real system, but by introducing artificial delay into the
      XFS I/O completion workqueues it can be reproduced easily.
      
      Fix this by iterating over all XFS inodes in ->sync_fs and log all that
      are dirty.  This might log inode that only got redirtied after the
      previous pass, but given how cheap delayed logging of inodes is it
      isn't a major concern for performance.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      be4f1ac8
  14. 13 12月, 2011 1 次提交
  15. 30 11月, 2011 1 次提交
  16. 12 10月, 2011 3 次提交
  17. 13 8月, 2011 1 次提交
    • C
      xfs: remove subdirectories · c59d87c4
      Christoph Hellwig 提交于
      Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
      annoying subdirectories in the XFS source code.  Besides the large
      amount of file rename the only changes are to the Makefile, a few
      files including headers with the subdirectory prefix, and the binary
      sysctl compat code that includes a header under fs/xfs/ from
      kernel/.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c59d87c4
  18. 26 7月, 2011 1 次提交
  19. 21 7月, 2011 2 次提交
  20. 08 7月, 2011 1 次提交
    • C
      xfs: improve sync behaviour in the face of aggressive dirtying · 33b8f7c2
      Christoph Hellwig 提交于
      The following script from Wu Fengguang shows very bad behaviour in XFS
      when aggressively dirtying data during a sync on XFS, with sync times
      up to almost 10 times as long as ext4.
      
      A large part of the issue is that XFS writes data out itself two times
      in the ->sync_fs method, overriding the livelock protection in the core
      writeback code, and another issue is the lock-less xfs_ioend_wait call,
      which doesn't prevent new ioend from being queue up while waiting for
      the count to reach zero.
      
      This patch removes the XFS-internal sync calls and relies on the VFS
      to do it's work just like all other filesystems do.  Note that the
      i_iocount wait which is rather suboptimal is simply removed here.
      We already do it in ->write_inode, which keeps the current supoptimal
      behaviour.  We'll eventually need to remove that as well, but that's
      material for a separate commit.
      
      ------------------------------ snip ------------------------------
      #!/bin/sh
      
      umount /dev/sda7
      mkfs.xfs -f /dev/sda7
      # mkfs.ext4 /dev/sda7
      # mkfs.btrfs /dev/sda7
      mount /dev/sda7 /fs
      
      echo $((50<<20)) > /proc/sys/vm/dirty_bytes
      
      pid=
      for i in `seq 10`
      do
      	dd if=/dev/zero of=/fs/zero-$i bs=1M count=1000 &
      	pid="$pid $!"
      done
      
      sleep 1
      
      tic=$(date +'%s')
      sync
      tac=$(date +'%s')
      
      echo
      echo sync time: $((tac-tic))
      egrep '(Dirty|Writeback|NFS_Unstable)' /proc/meminfo
      
      pidof dd > /dev/null && { kill -9 $pid; echo sync NOT livelocked; }
      ------------------------------ snip ------------------------------
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reported-by: NWu Fengguang <fengguang.wu@intel.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      33b8f7c2
  21. 25 5月, 2011 1 次提交