1. 27 10月, 2017 1 次提交
    • H
      xfs: check kthread_should_stop() after the setting of task state · 0bd89676
      Hou Tao 提交于
      A umount hang is possible when a race occurs between the umount
      process and the xfsaild kthread. The following sequences outline
      the race:
      
          xfsaild: kthread_should_stop()
      	     => return false, so xfsaild continue
      
          umount: set_bit(KTHREAD_SHOULD_STOP, &kthread->flags)
      	    => by kthread_stop()
          umount: wake_up_process()
      	    => because xfsaild is still running, so 0 is returned
      
          xfsaild: __set_current_state(TASK_INTERRUPTIBLE)
          xfsaild: schedule()
      	    => now, xfsaild will wait indefinitely
      
          umount: wait_for_completion()
      	    => and umount will hang
      
      To fix that, we need to check kthread_should_stop() after we set
      the task state, so the xfsaild will either see the stop bit and
      exit or the task state is reset to runnable by wake_up_process()
      such that it isn't scheduled out indefinitely and detects the stop
      bit at the next iteration.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NHou Tao <houtao1@huawei.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0bd89676
  2. 23 8月, 2017 2 次提交
    • B
      xfs: add log item pinning error injection tag · 7f4d01f3
      Brian Foster 提交于
      Add an error injection tag to force log items in the AIL to the
      pinned state. This option can be used by test infrastructure to
      induce head behind tail conditions. Specifically, this is intended
      to be used by xfstests to reproduce log recovery problems after
      failed/corrupted log writes overwrite the last good tail LSN in the
      log.
      
      When enabled, AIL push attempts see log items in the AIL in the
      pinned state. This stalls metadata writeback and thus prevents the
      current tail of the log from moving forward. When disabled,
      subsequent AIL pushes observe the log items in their appropriate
      state and filesystem operation continues as normal.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7f4d01f3
    • C
      xfs: Properly retry failed inode items in case of error during buffer writeback · d3a304b6
      Carlos Maiolino 提交于
      When a buffer has been failed during writeback, the inode items into it
      are kept flush locked, and are never resubmitted due the flush lock, so,
      if any buffer fails to be written, the items in AIL are never written to
      disk and never unlocked.
      
      This causes unmount operation to hang due these items flush locked in AIL,
      but this also causes the items in AIL to never be written back, even when
      the IO device comes back to normal.
      
      I've been testing this patch with a DM-thin device, creating a
      filesystem larger than the real device.
      
      When writing enough data to fill the DM-thin device, XFS receives ENOSPC
      errors from the device, and keep spinning on xfsaild (when 'retry
      forever' configuration is set).
      
      At this point, the filesystem can not be unmounted because of the flush locked
      items in AIL, but worse, the items in AIL are never retried at all
      (once xfs_inode_item_push() will skip the items that are flush locked),
      even if the underlying DM-thin device is expanded to the proper size.
      
      This patch fixes both cases, retrying any item that has been failed
      previously, using the infra-structure provided by the previous patch.
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d3a304b6
  3. 26 4月, 2017 1 次提交
  4. 08 2月, 2016 1 次提交
    • M
      xfs: Make xfsaild freezeable again · 18f1df4e
      Michal Hocko 提交于
      Hendik has reported suspend failures due to xfsaild blocking the freezer
      to settle down.
      Jan 17 19:59:56 linux-6380 kernel: PM: Syncing filesystems ... done.
      Jan 17 19:59:56 linux-6380 kernel: PM: Preparing system for sleep (mem)
      Jan 17 19:59:56 linux-6380 kernel: Freezing user space processes ... (elapsed 0.001 seconds) done.
      Jan 17 19:59:56 linux-6380 kernel: Freezing remaining freezable tasks ...
      Jan 17 19:59:56 linux-6380 kernel: Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, wq_busy=0):
      Jan 17 19:59:56 linux-6380 kernel: xfsaild/dm-5    S 00000000     0  1293      2 0x00000080
      Jan 17 19:59:56 linux-6380 kernel:  f0ef5f00 00000046 00000200 00000000 ffff9022 c02d3800 00000000 00000032
      Jan 17 19:59:56 linux-6380 kernel:  ee0b2400 00000032 f71e0d00 f36fabc0 f0ef2d00 f0ef6000 f0ef2d00 f12f90c0
      Jan 17 19:59:56 linux-6380 kernel:  f0ef5f0c c0844e44 00000000 f0ef5f6c f811e0be 00000000 00000000 f0ef2d00
      Jan 17 19:59:56 linux-6380 kernel: Call Trace:
      Jan 17 19:59:56 linux-6380 kernel:  [<c0844e44>] schedule+0x34/0x90
      Jan 17 19:59:56 linux-6380 kernel:  [<f811e0be>] xfsaild+0x5de/0x600 [xfs]
      Jan 17 19:59:56 linux-6380 kernel:  [<c0286cbb>] kthread+0x9b/0xb0
      Jan 17 19:59:56 linux-6380 kernel:  [<c0848a79>] ret_from_kernel_thread+0x21/0x38
      
      The issue has been there for quite some time but it has been made
      visible by only by 24ba16bb ("xfs: clear PF_NOFREEZE for xfsaild
      kthread") because the suspend started seeing xfsaild.
      
      The above commit has missed that the !xfs_ail_min branch might call
      schedule with TASK_INTERRUPTIBLE without calling try_to_freeze so the pm
      suspend would wake up the kernel thread over and over again without any
      progress. What we want here is to use freezable_schedule instead to hide
      the thread from the suspend.
      
      While we are here also change schedule_timeout to freezable variant to
      prevent from spurious wakeups by suspend.
      
      [dchinner: re-add set_freezeable call so the freezer will account properly
       for this kthread. ]
      Reported-by: NHendrik Woltersdorf <hendrikw@arcor.de>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      18f1df4e
  5. 19 1月, 2016 1 次提交
  6. 02 11月, 2015 1 次提交
  7. 12 10月, 2015 1 次提交
    • B
      xfs: per-filesystem stats counter implementation · ff6d6af2
      Bill O'Donnell 提交于
      This patch modifies the stats counting macros and the callers
      to those macros to properly increment, decrement, and add-to
      the xfs stats counts. The counts for global and per-fs stats
      are correctly advanced, and cleared by writing a "1" to the
      corresponding clear file.
      
      global counts: /sys/fs/xfs/stats/stats
      per-fs counts: /sys/fs/xfs/sda*/stats/stats
      
      global clear:  /sys/fs/xfs/stats/stats_clear
      per-fs clear:  /sys/fs/xfs/sda*/stats/stats_clear
      
      [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around
       stats structures and macros. ]
      Signed-off-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ff6d6af2
  8. 22 6月, 2015 1 次提交
  9. 28 11月, 2014 2 次提交
  10. 25 6月, 2014 1 次提交
    • D
      xfs: global error sign conversion · 2451337d
      Dave Chinner 提交于
      Convert all the errors the core XFs code to negative error signs
      like the rest of the kernel and remove all the sign conversion we
      do in the interface layers.
      
      Errors for conversion (and comparison) found via searches like:
      
      $ git grep " E" fs/xfs
      $ git grep "return E" fs/xfs
      $ git grep " E[A-Z].*;$" fs/xfs
      
      Negation points found via searches like:
      
      $ git grep "= -[a-z,A-Z]" fs/xfs
      $ git grep "return -[a-z,A-D,F-Z]" fs/xfs
      $ git grep " -[a-z].*;" fs/xfs
      
      [ with some bits I missed from Brian Foster ]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2451337d
  11. 14 4月, 2014 1 次提交
  12. 07 11月, 2013 1 次提交
  13. 24 10月, 2013 1 次提交
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
  14. 31 8月, 2013 1 次提交
  15. 24 8月, 2013 1 次提交
  16. 18 12月, 2012 1 次提交
  17. 30 7月, 2012 1 次提交
    • B
      xfs: re-enable xfsaild idle mode and fix associated races · 8375f922
      Brian Foster 提交于
      xfsaild idle mode logic currently leads to a couple hangs:
      
      1.) If xfsaild is rescheduled in during an incremental scan
          (i.e., tout != 0) and the target has been updated since
          the previous run, we can hit the new target and go into
          idle mode with a still populated ail.
      2.) A wake up is only issued when the target is pushed forward.
          The wake up can race with xfsaild if it is currently in the
          process of entering idle mode, causing future wake up
          events to be lost.
      
      These hangs have been reproduced and verified as fixed by
      running xfstests 273 in a loop on a slightly modified upstream
      kernel. The kernel is modified to re-enable idle mode as
      previously implemented (when count == 0) and with a revert of
      commit 670ce93f, which includes performance improvements that
      make this harder to reproduce.
      
      The solution, the algorithm for which has been outlined by
      Dave Chinner, is to modify xfsaild to enter idle mode only when
      the ail is empty and the push target has not been moved forward
      since the last push.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      8375f922
  18. 15 5月, 2012 6 次提交
    • D
      xfs: move xfsagino_t to xfs_types.h · 60a34607
      Dave Chinner 提交于
      Untangle the header file includes a bit by moving the definition of
      xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
      xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
      xfs_ag.h.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      60a34607
    • D
      xfs: pass shutdown method into xfs_trans_ail_delete_bulk · 04913fdd
      Dave Chinner 提交于
      xfs_trans_ail_delete_bulk() can be called from different contexts so
      if the item is not in the AIL we need different shutdown for each
      context.  Pass in the shutdown method needed so the correct action
      can be taken.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      04913fdd
    • C
      a8569171
    • C
      xfs: on-stack delayed write buffer lists · 43ff2122
      Christoph Hellwig 提交于
      Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
      and write back the buffers per-process instead of by waking up xfsbufd.
      
      This is now easily doable given that we have very few places left that write
      delwri buffers:
      
       - log recovery:
      	Only done at mount time, and already forcing out the buffers
      	synchronously using xfs_flush_buftarg
      
       - quotacheck:
      	Same story.
      
       - dquot reclaim:
      	Writes out dirty dquots on the LRU under memory pressure.  We might
      	want to look into doing more of this via xfsaild, but it's already
      	more optimal than the synchronous inode reclaim that writes each
      	buffer synchronously.
      
       - xfsaild:
      	This is the main beneficiary of the change.  By keeping a local list
      	of buffers to write we reduce latency of writing out buffers, and
      	more importably we can remove all the delwri list promotions which
      	were hitting the buffer cache hard under sustained metadata loads.
      
      The implementation is very straight forward - xfs_buf_delwri_queue now gets
      a new list_head pointer that it adds the delwri buffers to, and all callers
      need to eventually submit the list using xfs_buf_delwi_submit or
      xfs_buf_delwi_submit_nowait.  Buffers that already are on a delwri list are
      skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
      list.  The biggest change to pass down the buffer list was done to the AIL
      pushing. Now that we operate on buffers the trylock, push and pushbuf log
      item methods are merged into a single push routine, which tries to lock the
      item, and if possible add the buffer that needs writeback to the buffer list.
      This leads to much simpler code than the previous split but requires the
      individual IOP_PUSH instances to unlock and reacquire the AIL around calls
      to blocking routines.
      
      Given that xfsailds now also handle writing out buffers, the conditions for
      log forcing and the sleep times needed some small changes.  The most
      important one is that we consider an AIL busy as long we still have buffers
      to push, and the other one is that we do increment the pushed LSN for
      buffers that are under flushing at this moment, but still count them towards
      the stuck items for restart purposes.  Without this we could hammer on stuck
      items without ever forcing the log and not make progress under heavy random
      delete workloads on fast flash storage devices.
      
      [ Dave Chinner:
      	- rebase on previous patches.
      	- improved comments for XBF_DELWRI_Q handling
      	- fix XBF_ASYNC handling in queue submission (test 106 failure)
      	- rename delwri submit function buffer list parameters for clarity
      	- xfs_efd_item_push() should return XFS_ITEM_PINNED ]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      43ff2122
    • C
      xfs: implement freezing by emptying the AIL · 211e4d43
      Christoph Hellwig 提交于
      Now that we write back all metadata either synchronously or through
      the AIL we can simply implement metadata freezing in terms of
      emptying the AIL.
      
      The implementation for this is fairly simply and straight-forward:
      A new routine is added that asks the xfsaild to push the AIL to the
      end and waits for it to complete and send a wakeup. The routine will
      then loop if the AIL is not actually empty, and continue to do so
      until the AIL is compeltely empty.
      
      We keep an inode reclaim pass in the freeze process to avoid having
      memory pressure have to reclaim inodes that require dirtying the
      filesystem to be reclaimed after the freeze has completed. This
      means we can also treat unmount in the exact same way as freeze.
      
      As an upside we can now remove the radix tree based inode writeback
      and xfs_unmountfs_writesb.
      
      [ Dave Chinner:
      	- Cleaned up commit message.
      	- Added inode reclaim passes back into freeze.
      	- Cleaned up wakeup mechanism to avoid the use of a new
      	  sleep counter variable. ]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      211e4d43
    • C
      xfs: allow assigning the tail lsn with the AIL lock held · 1c304625
      Christoph Hellwig 提交于
      Provide a variant of xlog_assign_tail_lsn that has the AIL lock already
      held.  By doing so we do an additional atomic_read + atomic_set under
      the lock, which comes down to two instructions.
      
      Switch xfs_trans_ail_update_bulk and xfs_trans_ail_delete_bulk to the
      new version to reduce the number of lock roundtrips, and prepare for
      a new addition that would require a third lock roundtrip in
      xfs_trans_ail_delete_bulk.  This addition is also the reason for
      slightly rearranging the conditionals and relying on xfs_log_space_wake
      for checking that the filesystem has been shut down internally.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1c304625
  19. 23 2月, 2012 3 次提交
  20. 19 10月, 2011 2 次提交
  21. 12 10月, 2011 4 次提交
  22. 10 8月, 2011 1 次提交
  23. 21 7月, 2011 3 次提交
    • D
      xfs: convert AIL cursors to use struct list_head · af3e4022
      Dave Chinner 提交于
      The list of active AIL cursors uses a roll-your-own linked list with
      special casing for the AIL push cursor. Simplify this code by
      replacing the list with standard struct list_head lists, and use a
      separate list_head to track the active cursors. This allows us to
      treat the AIL push cursor as a generic cursor rather than as a
      special case, further simplifying the code.
      
      Further, fix the duplicate push cursor initialisation that the
      special case handling was hiding, and clean up all the comments
      around the active cursor list handling.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      af3e4022
    • D
      xfs: remove confusing ail cursor wrapper · 16b59029
      Dave Chinner 提交于
      xfs_trans_ail_cursor_set() doesn't set the cursor to the current log
      item, it sets it to the next item. There is already a function for
      doing this - xfs_trans_ail_cursor_next() - and the _set function is
      simply a two line wrapper.  Remove it and open code the setting of
      the cursor in the two locations that call it to remove the
      confusion.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      16b59029
    • D
      xfs: use a cursor for bulk AIL insertion · 1d8c95a3
      Dave Chinner 提交于
      Delayed logging can insert tens of thousands of log items into the
      AIL at the same LSN. When the committing of log commit records
      occur, we can get insertions occurring at an LSN that is not at the
      end of the AIL. If there are thousands of items in the AIL on the
      tail LSN, each insertion has to walk the AIL to find the correct
      place to insert the new item into the AIL. This can consume large
      amounts of CPU time and block other operations from occurring while
      the traversals are in progress.
      
      To avoid this repeated walk, use a AIL cursor to record
      where we should be inserting the new items into the AIL without
      having to repeat the walk. The cursor infrastructure already
      provides this functionality for push walks, so is a simple extension
      of existing code. While this will not avoid the initial walk, it
      will avoid repeating it tens of thousands of times during a single
      checkpoint commit.
      
      This version includes logic improvements from Christoph Hellwig.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      1d8c95a3
  24. 10 5月, 2011 2 次提交
    • D
      xfs: fix race condition in AIL push trigger · 7ac95657
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One is caused by a
      race condition in determining whether there is a psh in progress or
      not.
      
      The XFS_AIL_PUSHING_BIT is used to determine whether a push is
      currently in progress.  When the AIL push work completes, it checked
      whether the target changed and cleared the PUSHING bit to allow a
      new push to be requeued. The race condition is as follows:
      
      	Thread 1		push work
      
      	smp_wmb()
      				smp_rmb()
      				check ailp->xa_target unchanged
      	update ailp->xa_target
      	test/set PUSHING bit
      	does not queue
      				clear PUSHING bit
      				does not requeue
      
      Now that the push target is updated, new attempts to push the AIL
      will not trigger as the push target will be the same, and hence
      despite trying to push the AIL we won't ever wake it again.
      
      The fix is to ensure that the AIL push work clears the PUSHING bit
      before it checks if the target is unchanged.
      
      As a result, both push triggers operate on the same test/set bit
      criteria, so even if we race in the push work and miss the target
      update, the thread requesting the push will still set the PUSHING
      bit and queue the push work to occur. For safety sake, the same
      queue check is done if the push work detects the target change,
      though only one of the two will will queue new work due to the use
      of test_and_set_bit() checks.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit e4d3c4a4)
      7ac95657
    • D
      xfs: make AIL target updates and compares 32bit safe. · fe0da767
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One of the problems
      noticed was that updates of the push target are not 32 bit safe as
      the target is a 64 bit value.
      
      We cannot copy a 64 bit LSN without the possibility of corrupting
      the result when racing with another updating thread. We have
      function to do this update safely without needing to care about
      32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when
      updating the AIL push target.
      
      Also move the reading of the target in the push work inside the AIL
      lock, and use XFS_LSN_CMP() for the unlocked comparison during work
      termination to close read holes as well.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit fd5670f2)
      fe0da767