1. 08 7月, 2011 5 次提交
  2. 07 7月, 2011 1 次提交
    • D
      xfs: unpin stale inodes directly in IOP_COMMITTED · 1316d4da
      Dave Chinner 提交于
      When inodes are marked stale in a transaction, they are treated
      specially when the inode log item is being inserted into the AIL.
      It tries to avoid moving the log item forward in the AIL due to a
      race condition with the writing the underlying buffer back to disk.
      The was "fixed" in commit de25c181 ("xfs: avoid moving stale inodes
      in the AIL").
      
      To avoid moving the item forward, we return a LSN smaller than the
      commit_lsn of the completing transaction, thereby trying to trick
      the commit code into not moving the inode forward at all. I'm not
      sure this ever worked as intended - it assumes the inode is already
      in the AIL, but I don't think the returned LSN would have been small
      enough to prevent moving the inode. It appears that the reason it
      worked is that the lower LSN of the inodes meant they were inserted
      into the AIL and flushed before the inode buffer (which was moved to
      the commit_lsn of the transaction).
      
      The big problem is that with delayed logging, the returning of the
      different LSN means insertion takes the slow, non-bulk path.  Worse
      yet is that insertion is to a position -before- the commit_lsn so it
      is doing a AIL traversal on every insertion, and has to walk over
      all the items that have already been inserted into the AIL. It's
      expensive.
      
      To compound the matter further, with delayed logging inodes are
      likely to go from clean to stale in a single checkpoint, which means
      they aren't even in the AIL at all when we come across them at AIL
      insertion time. Hence these were all getting inserted into the AIL
      when they simply do not need to be as inodes marked XFS_ISTALE are
      never written back.
      
      Transactional/recovery integrity is maintained in this case by the
      other items in the unlink transaction that were modified (e.g. the
      AGI btree blocks) and committed in the same checkpoint.
      
      So to fix this, simply unpin the stale inodes directly in
      xfs_inode_item_committed() and return -1 to indicate that the AIL
      insertion code does not need to do any further processing of these
      inodes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      1316d4da
  3. 24 6月, 2011 3 次提交
    • D
      xfs: prevent bogus assert when trying to remove non-existent attribute · 4a338212
      Dave Chinner 提交于
      If the attribute fork on an inode is in btree format and has
      multiple levels (i.e node format rather than leaf format), then a
      lookup failure will trigger an assert failure in xfs_da_path_shift
      if the flag XFS_DA_OP_OKNOENT is not set. This flag is used to
      indicate to the directory btree code that not finding an entry is
      not a fatal error. In the case of doing a lookup for a directory
      name removal, this is valid as a user cannot insert an arbitrary
      name to remove from the directory btree.
      
      However, in the case of the attribute tree, a user has direct
      control over the attribute name and can ask for any random name to
      be removed without any validation. In this case, fsstress is asking
      for a non-existent user.selinux attribute to be removed, and that is
      causing xfs_da_path_shift() to fall off the bottom of the tree where
      it asserts that a lookup failure is allowed. Because the flag is not
      set, we die a horrible death on a debug enable kernel.
      
      Prevent this assert from firing on attribute removes by adding the
      op_flag XFS_DA_OP_OKNOENT to atribute removal operations.
      
      Discovered when testing on a SELinux enabled system by fsstress in
      test 070 by trying to remove a non-existent user.selinux attribute.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      4a338212
    • D
      xfs: clear XFS_IDIRTY_RELEASE on truncate down · df4368a1
      Dave Chinner 提交于
      When an inode is truncated down, speculative preallocation is
      removed from the inode. This should also reset the state bits for
      controlling whether preallocation is subsequently removed when the
      file is next closed. The flag is not being cleared, so repeated
      operations on a file that first involve a truncate (e.g. multiple
      repeated dd invocations on a file) give different file layouts for
      the second and subsequent invocations.
      
      Fix this by clearing the XFS_IDIRTY_RELEASE state bit when the
      XFS_ITRUNCATED bit is detected in xfs_release() and hence ensure
      that speculative delalloc is removed on files that have been
      truncated down.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      df4368a1
    • D
      xfs: reset inode per-lifetime state when recycling it · 778e24bb
      Dave Chinner 提交于
      XFS inodes has several per-lifetime state fields that determine the
      behaviour of the inode. These state fields are not all reset when an
      inode is reused from the reclaimable state.
      
      This can lead to unexpected behaviour of the new inode such as
      speculative preallocation not being truncated away in the expected
      manner for local files until the inode is subsequently truncated,
      freed or cycles out of the cache. It can also lead to an inode being
      considered to be a filestream inode or having been truncated when
      that is not the case.
      
      Rework the reinitialisation of the inode when it is recycled to
      ensure that it is pristine before it is reused. While there, also
      fix the resetting of state flags in the recycling error paths so the
      inode does not become unreclaimable.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      778e24bb
  4. 16 6月, 2011 1 次提交
    • C
      xfs: make log devices with write back caches work · a27a263b
      Christoph Hellwig 提交于
      There's no reason not to support cache flushing on external log devices.
      The only thing this really requires is flushing the data device first
      both in fsync and log commits.  A side effect is that we also have to
      remove the barrier write test during mount, which has been superflous
      since the new FLUSH+FUA code anyway.  Also use the chance to flush the
      RT subvolume write cache before the fsync commit, which is required
      for correct semantics.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      a27a263b
  5. 15 6月, 2011 1 次提交
  6. 27 5月, 2011 1 次提交
    • C
      fs: pass exact type of data dirties to ->dirty_inode · aa385729
      Christoph Hellwig 提交于
      Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
      anything else, so that the filesystem can track internally if it
      needs to push out a transaction for fdatasync or not.
      
      This is just the prototype change with no user for it yet.  I plan
      to push large XFS changes for the next merge window, and getting
      this trivial infrastructure in this window would help a lot to avoid
      tree interdependencies.
      
      Also remove incorrect comments that ->dirty_inode can't block.  That
      has been changed a long time ago, and many implementations rely on it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aa385729
  7. 25 5月, 2011 12 次提交
  8. 20 5月, 2011 6 次提交
    • D
      xfs: obey minleft values during extent allocation correctly · bf59170a
      Dave Chinner 提交于
      When allocating an extent that is long enough to consume the
      remaining free space in an AG, we need to ensure that the allocation
      leaves enough space in the AG for any subsequent bmap btree blocks
      that are needed to track the new extent. These have to be allocated
      in the same AG as we only reserve enough blocks in an allocation
      transaction for modification of the freespace trees in a single AG.
      
      xfs_alloc_fix_minleft() has been considering blocks on the AGFL as
      free blocks available for extent and bmbt block allocation, which is
      not correct - blocks on the AGFL are there exclusively for the use
      of the free space btrees. As a result, when minleft is less than the
      number of blocks on the AGFL, xfs_alloc_fix_minleft() does not trim
      the given extent to leave minleft blocks available for bmbt
      allocation, and hence we can fail allocation during bmbt record
      insertion.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      bf59170a
    • D
      xfs: reset buffer pointers before freeing them · 44396476
      Dave Chinner 提交于
      When we free a vmapped buffer, we need to ensure the vmap address
      and length we free is the same as when it was allocated. In various
      places in the log code we change the memory the buffer is pointing
      to before issuing IO, but we never reset the buffer to point back to
      it's original memory (or no memory, if that is the case for the
      buffer).
      
      As a result, when we free the buffer it points to memory that is
      owned by something else and attempts to unmap and free it. Because
      the range does not match any known mapped range, it can trigger
      BUG_ON() traps in the vmap code, and potentially corrupt the vmap
      area tracking.
      
      Fix this by always resetting these buffers to their original state
      before freeing them.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      44396476
    • D
      xfs: avoid getting stuck during async inode flushes · ee58abdf
      Dave Chinner 提交于
      When the underlying inode buffer is locked and xfs_sync_inode_attr()
      is doing a non-blocking flush, xfs_iflush() can return EAGAIN.  When
      this happens, clear the error rather than returning it to
      xfs_inode_ag_walk(), as returning EAGAIN will result in the AG walk
      delaying for a short while and trying again. This can result in
      background walks getting stuck on the one AG until inode buffer is
      unlocked by some other means.
      
      This behaviour was noticed when analysing event traces followed by
      code inspection and verification of the fix via further traces.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      ee58abdf
    • D
      xfs: fix xfs_itruncate_start tracing · e5737515
      Dave Chinner 提交于
      Variables are ordered incorrectly in trace call.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      e5737515
    • D
      xfs: fix duplicate workqueue initialisation · 1beb65ad
      Dave Chinner 提交于
      The workqueue initialisation function is called twice when
      initialising the XFS subsystem. Remove the second initialisation
      call.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      1beb65ad
    • J
      xfs: kill off xfs_printk() · e69522a8
      Joe Perches 提交于
      xfs_alert_tag() can be defined using xfs_alert(), and thereby avoid
      using xfs_printk() altogether.  This is the only remaining use of
      xfs_printk(), so changing it this way means xfs_printk() can simply
      be eliminated.can simply be eliminated.can simply be eliminated.can
      simply be eliminated.can simply be eliminated.can simply be
      eliminated.can simply be eliminated.can simply be eliminated.can
      simply be eliminated.
      
      Also add format checking to the non-debug inline function xfs_debug.
      Miscellaneous function prototype argument alignment.
      
      (Updated to delete the definition of xfs_printk(), which is
      no longer used or needed.)
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      e69522a8
  9. 10 5月, 2011 10 次提交
    • J
      treewide: fix a few typos in comments · 70f23fd6
      Justin P. Mattock 提交于
      - kenrel -> kernel
      - whetehr -> whether
      - ttt -> tt
      - sss -> ss
      Signed-off-by: NJustin P. Mattock <justinmattock@gmail.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      70f23fd6
    • D
      xfs: fix race condition in AIL push trigger · 7ac95657
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One is caused by a
      race condition in determining whether there is a psh in progress or
      not.
      
      The XFS_AIL_PUSHING_BIT is used to determine whether a push is
      currently in progress.  When the AIL push work completes, it checked
      whether the target changed and cleared the PUSHING bit to allow a
      new push to be requeued. The race condition is as follows:
      
      	Thread 1		push work
      
      	smp_wmb()
      				smp_rmb()
      				check ailp->xa_target unchanged
      	update ailp->xa_target
      	test/set PUSHING bit
      	does not queue
      				clear PUSHING bit
      				does not requeue
      
      Now that the push target is updated, new attempts to push the AIL
      will not trigger as the push target will be the same, and hence
      despite trying to push the AIL we won't ever wake it again.
      
      The fix is to ensure that the AIL push work clears the PUSHING bit
      before it checks if the target is unchanged.
      
      As a result, both push triggers operate on the same test/set bit
      criteria, so even if we race in the push work and miss the target
      update, the thread requesting the push will still set the PUSHING
      bit and queue the push work to occur. For safety sake, the same
      queue check is done if the push work detects the target change,
      though only one of the two will will queue new work due to the use
      of test_and_set_bit() checks.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit e4d3c4a4)
      7ac95657
    • D
      xfs: make AIL target updates and compares 32bit safe. · fe0da767
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One of the problems
      noticed was that updates of the push target are not 32 bit safe as
      the target is a 64 bit value.
      
      We cannot copy a 64 bit LSN without the possibility of corrupting
      the result when racing with another updating thread. We have
      function to do this update safely without needing to care about
      32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when
      updating the AIL push target.
      
      Also move the reading of the target in the push work inside the AIL
      lock, and use XFS_LSN_CMP() for the unlocked comparison during work
      termination to close read holes as well.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit fd5670f2)
      fe0da767
    • D
      xfs: always push the AIL to the target · 50e86686
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One of the problems
      discovered is a target mismatch between the item pushing loop and
      the target itself.
      
      The push trigger checks for the target increasing (i.e. new target >
      current) while the push loop only pushes items that have a LSN <
      current. As a result, we can get the situation where the push target
      is X, the items at the tail of the AIL have LSN X and they don't get
      pushed. The push work then completes thinking it is done, and cannot
      be restarted until the push target increases to >= X + 1. If the
      push target then never increases (because the tail is not moving),
      then we never run the push work again and we stall.
      
      Fix it by making sure log items with a LSN that matches the target
      exactly are pushed during the loop.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit cb64026b)
      50e86686
    • D
      xfs: exit AIL push work correctly when AIL is empty · 9e7004e7
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. The main cause is a
      regression where a work exit path fails to clear the PUSHING state
      and recheck the target correctly.
      
      Make both exit paths do the same PUSHING bit clearing and target
      checking when the "no more work to be done" condition is hit.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit ea35a200)
      9e7004e7
    • D
      xfs: ensure reclaim cursor is reset correctly at end of AG · 228d62dd
      Dave Chinner 提交于
      On a 32 bit highmem PowerPC machine, the XFS inode cache was growing
      without bound and exhausting low memory causing the OOM killer to be
      triggered. After some effort, the problem was reproduced on a 32 bit
      x86 highmem machine.
      
      The problem is that the per-ag inode reclaim index cursor was not
      getting reset to the start of the AG if the radix tree tag lookup
      found no more reclaimable inodes. Hence every further reclaim
      attempt started at the same index beyond where any reclaimable
      inodes lay, and no further background reclaim ever occurred from the
      AG.
      
      Without background inode reclaim the VM driven cache shrinker
      simply cannot keep up with cache growth, and OOM is the result.
      
      While the change that exposed the problem was the conversion of the
      inode reclaim to use work queues for background reclaim, it was not
      the cause of the bug. The bug was introduced when the cursor code
      was added, just waiting for some weird configuration to strike....
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-By: NChristian Kujau <lists@nerdbynature.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit b2232219)
      228d62dd
    • D
      xfs: fix race condition in AIL push trigger · e4d3c4a4
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One is caused by a
      race condition in determining whether there is a psh in progress or
      not.
      
      The XFS_AIL_PUSHING_BIT is used to determine whether a push is
      currently in progress.  When the AIL push work completes, it checked
      whether the target changed and cleared the PUSHING bit to allow a
      new push to be requeued. The race condition is as follows:
      
      	Thread 1		push work
      
      	smp_wmb()
      				smp_rmb()
      				check ailp->xa_target unchanged
      	update ailp->xa_target
      	test/set PUSHING bit
      	does not queue
      				clear PUSHING bit
      				does not requeue
      
      Now that the push target is updated, new attempts to push the AIL
      will not trigger as the push target will be the same, and hence
      despite trying to push the AIL we won't ever wake it again.
      
      The fix is to ensure that the AIL push work clears the PUSHING bit
      before it checks if the target is unchanged.
      
      As a result, both push triggers operate on the same test/set bit
      criteria, so even if we race in the push work and miss the target
      update, the thread requesting the push will still set the PUSHING
      bit and queue the push work to occur. For safety sake, the same
      queue check is done if the push work detects the target change,
      though only one of the two will will queue new work due to the use
      of test_and_set_bit() checks.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      e4d3c4a4
    • D
      xfs: make AIL target updates and compares 32bit safe. · fd5670f2
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One of the problems
      noticed was that updates of the push target are not 32 bit safe as
      the target is a 64 bit value.
      
      We cannot copy a 64 bit LSN without the possibility of corrupting
      the result when racing with another updating thread. We have
      function to do this update safely without needing to care about
      32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when
      updating the AIL push target.
      
      Also move the reading of the target in the push work inside the AIL
      lock, and use XFS_LSN_CMP() for the unlocked comparison during work
      termination to close read holes as well.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      fd5670f2
    • D
      xfs: always push the AIL to the target · cb64026b
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One of the problems
      discovered is a target mismatch between the item pushing loop and
      the target itself.
      
      The push trigger checks for the target increasing (i.e. new target >
      current) while the push loop only pushes items that have a LSN <
      current. As a result, we can get the situation where the push target
      is X, the items at the tail of the AIL have LSN X and they don't get
      pushed. The push work then completes thinking it is done, and cannot
      be restarted until the push target increases to >= X + 1. If the
      push target then never increases (because the tail is not moving),
      then we never run the push work again and we stall.
      
      Fix it by making sure log items with a LSN that matches the target
      exactly are pushed during the loop.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      cb64026b
    • D
      xfs: exit AIL push work correctly when AIL is empty · ea35a200
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. The main cause is a
      regression where a work exit path fails to clear the PUSHING state
      and recheck the target correctly.
      
      Make both exit paths do the same PUSHING bit clearing and target
      checking when the "no more work to be done" condition is hit.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      ea35a200
新手
引导
客服 返回
顶部