1. 25 5月, 2011 7 次提交
  2. 20 5月, 2011 6 次提交
    • D
      xfs: obey minleft values during extent allocation correctly · bf59170a
      Dave Chinner 提交于
      When allocating an extent that is long enough to consume the
      remaining free space in an AG, we need to ensure that the allocation
      leaves enough space in the AG for any subsequent bmap btree blocks
      that are needed to track the new extent. These have to be allocated
      in the same AG as we only reserve enough blocks in an allocation
      transaction for modification of the freespace trees in a single AG.
      
      xfs_alloc_fix_minleft() has been considering blocks on the AGFL as
      free blocks available for extent and bmbt block allocation, which is
      not correct - blocks on the AGFL are there exclusively for the use
      of the free space btrees. As a result, when minleft is less than the
      number of blocks on the AGFL, xfs_alloc_fix_minleft() does not trim
      the given extent to leave minleft blocks available for bmbt
      allocation, and hence we can fail allocation during bmbt record
      insertion.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      bf59170a
    • D
      xfs: reset buffer pointers before freeing them · 44396476
      Dave Chinner 提交于
      When we free a vmapped buffer, we need to ensure the vmap address
      and length we free is the same as when it was allocated. In various
      places in the log code we change the memory the buffer is pointing
      to before issuing IO, but we never reset the buffer to point back to
      it's original memory (or no memory, if that is the case for the
      buffer).
      
      As a result, when we free the buffer it points to memory that is
      owned by something else and attempts to unmap and free it. Because
      the range does not match any known mapped range, it can trigger
      BUG_ON() traps in the vmap code, and potentially corrupt the vmap
      area tracking.
      
      Fix this by always resetting these buffers to their original state
      before freeing them.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      44396476
    • D
      xfs: avoid getting stuck during async inode flushes · ee58abdf
      Dave Chinner 提交于
      When the underlying inode buffer is locked and xfs_sync_inode_attr()
      is doing a non-blocking flush, xfs_iflush() can return EAGAIN.  When
      this happens, clear the error rather than returning it to
      xfs_inode_ag_walk(), as returning EAGAIN will result in the AG walk
      delaying for a short while and trying again. This can result in
      background walks getting stuck on the one AG until inode buffer is
      unlocked by some other means.
      
      This behaviour was noticed when analysing event traces followed by
      code inspection and verification of the fix via further traces.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      ee58abdf
    • D
      xfs: fix xfs_itruncate_start tracing · e5737515
      Dave Chinner 提交于
      Variables are ordered incorrectly in trace call.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      e5737515
    • D
      xfs: fix duplicate workqueue initialisation · 1beb65ad
      Dave Chinner 提交于
      The workqueue initialisation function is called twice when
      initialising the XFS subsystem. Remove the second initialisation
      call.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      1beb65ad
    • J
      xfs: kill off xfs_printk() · e69522a8
      Joe Perches 提交于
      xfs_alert_tag() can be defined using xfs_alert(), and thereby avoid
      using xfs_printk() altogether.  This is the only remaining use of
      xfs_printk(), so changing it this way means xfs_printk() can simply
      be eliminated.can simply be eliminated.can simply be eliminated.can
      simply be eliminated.can simply be eliminated.can simply be
      eliminated.can simply be eliminated.can simply be eliminated.can
      simply be eliminated.
      
      Also add format checking to the non-debug inline function xfs_debug.
      Miscellaneous function prototype argument alignment.
      
      (Updated to delete the definition of xfs_printk(), which is
      no longer used or needed.)
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      e69522a8
  3. 10 5月, 2011 5 次提交
    • D
      xfs: fix race condition in AIL push trigger · e4d3c4a4
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One is caused by a
      race condition in determining whether there is a psh in progress or
      not.
      
      The XFS_AIL_PUSHING_BIT is used to determine whether a push is
      currently in progress.  When the AIL push work completes, it checked
      whether the target changed and cleared the PUSHING bit to allow a
      new push to be requeued. The race condition is as follows:
      
      	Thread 1		push work
      
      	smp_wmb()
      				smp_rmb()
      				check ailp->xa_target unchanged
      	update ailp->xa_target
      	test/set PUSHING bit
      	does not queue
      				clear PUSHING bit
      				does not requeue
      
      Now that the push target is updated, new attempts to push the AIL
      will not trigger as the push target will be the same, and hence
      despite trying to push the AIL we won't ever wake it again.
      
      The fix is to ensure that the AIL push work clears the PUSHING bit
      before it checks if the target is unchanged.
      
      As a result, both push triggers operate on the same test/set bit
      criteria, so even if we race in the push work and miss the target
      update, the thread requesting the push will still set the PUSHING
      bit and queue the push work to occur. For safety sake, the same
      queue check is done if the push work detects the target change,
      though only one of the two will will queue new work due to the use
      of test_and_set_bit() checks.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      e4d3c4a4
    • D
      xfs: make AIL target updates and compares 32bit safe. · fd5670f2
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One of the problems
      noticed was that updates of the push target are not 32 bit safe as
      the target is a 64 bit value.
      
      We cannot copy a 64 bit LSN without the possibility of corrupting
      the result when racing with another updating thread. We have
      function to do this update safely without needing to care about
      32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when
      updating the AIL push target.
      
      Also move the reading of the target in the push work inside the AIL
      lock, and use XFS_LSN_CMP() for the unlocked comparison during work
      termination to close read holes as well.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      fd5670f2
    • D
      xfs: always push the AIL to the target · cb64026b
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One of the problems
      discovered is a target mismatch between the item pushing loop and
      the target itself.
      
      The push trigger checks for the target increasing (i.e. new target >
      current) while the push loop only pushes items that have a LSN <
      current. As a result, we can get the situation where the push target
      is X, the items at the tail of the AIL have LSN X and they don't get
      pushed. The push work then completes thinking it is done, and cannot
      be restarted until the push target increases to >= X + 1. If the
      push target then never increases (because the tail is not moving),
      then we never run the push work again and we stall.
      
      Fix it by making sure log items with a LSN that matches the target
      exactly are pushed during the loop.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      cb64026b
    • D
      xfs: exit AIL push work correctly when AIL is empty · ea35a200
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. The main cause is a
      regression where a work exit path fails to clear the PUSHING state
      and recheck the target correctly.
      
      Make both exit paths do the same PUSHING bit clearing and target
      checking when the "no more work to be done" condition is hit.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      ea35a200
    • D
      xfs: ensure reclaim cursor is reset correctly at end of AG · b2232219
      Dave Chinner 提交于
      On a 32 bit highmem PowerPC machine, the XFS inode cache was growing
      without bound and exhausting low memory causing the OOM killer to be
      triggered. After some effort, the problem was reproduced on a 32 bit
      x86 highmem machine.
      
      The problem is that the per-ag inode reclaim index cursor was not
      getting reset to the start of the AG if the radix tree tag lookup
      found no more reclaimable inodes. Hence every further reclaim
      attempt started at the same index beyond where any reclaimable
      inodes lay, and no further background reclaim ever occurred from the
      AG.
      
      Without background inode reclaim the VM driven cache shrinker
      simply cannot keep up with cache growth, and OOM is the result.
      
      While the change that exposed the problem was the conversion of the
      inode reclaim to use work queues for background reclaim, it was not
      the cause of the bug. The bug was introduced when the cursor code
      was added, just waiting for some weird configuration to strike....
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-By: NChristian Kujau <lists@nerdbynature.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      b2232219
  4. 29 4月, 2011 7 次提交
  5. 21 4月, 2011 1 次提交
  6. 12 4月, 2011 1 次提交
  7. 08 4月, 2011 12 次提交
    • C
      xfs: use proper interfaces for on-stack plugging · a1b7ea5d
      Christoph Hellwig 提交于
      Add proper blk_start_plug/blk_finish_plug pairs for the two places where
      we issue buffer I/O, and remove the blk_flush_plug in xfs_buf_lock and
      xfs_buf_iowait, given that context switches already flush the per-process
      plugging lists.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      a1b7ea5d
    • C
      xfs: fix xfs_debug warnings · 957935dc
      Christoph Hellwig 提交于
      For a CONFIG_XFS_DEBUG=n build gcc complains about statements with no
      effect in xfs_debug:
      
      fs/xfs/quota/xfs_qm_syscalls.c: In function 'xfs_qm_scall_trunc_qfiles':
      fs/xfs/quota/xfs_qm_syscalls.c:291:3: warning: statement with no effect
      
      The reason for that is that the various new xfs message functions have a
      return value which is never used, and in case of the non-debug build
      xfs_debug the macro evaluates to a plain 0 which produces the above
      warnings.  This can be fixed by turning xfs_debug into an inline function
      instead of a macro, but in addition to that I've also changed all the
      message helpers to return void as we never use their return values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      957935dc
    • C
      xfs: fix variable set but not used warnings · ecb697c1
      Christoph Hellwig 提交于
      GCC 4.6 now warnings about variables set but not used.  Fix the trivially
      fixable warnings of this sort.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      ecb697c1
    • D
      xfs: convert log tail checking to a warning · da8a1a4a
      Dave Chinner 提交于
      On the Power platform, the log tail debug checks fire excessively
      causing the system to panic early in testing. The debug checks are
      known to be racy, though on x86_64 there is no evidence that they
      trigger at all.
      
      We want to keep the checks active on debug systems to alert us to
      problems with log space accounting, but we need to reduce the impact
      of a racy check on testing on the Power platform.
      
      As a result, convert the ASSERT conditions to warnings, and
      allow them to fire only once per filesystem mount. This will prevent
      false positives from interfering with testing, whilst still
      providing us with the indication that they may be a problem with log
      space accounting should that occur.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      da8a1a4a
    • D
      xfs: catch bad block numbers freeing extents. · be65b18a
      Dave Chinner 提交于
      A fuzzed filesystem crashed a kernel when freeing an extent with a
      block number beyond the end of the filesystem. Convert all the debug
      asserts in xfs_free_extent() to active checks so that we catch bad
      extents and return that the filesytsem is corrupted rather than
      crashing.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      be65b18a
    • D
      xfs: push the AIL from memory reclaim and periodic sync · fd074841
      Dave Chinner 提交于
      When we are short on memory, we want to expedite the cleaning of
      dirty objects.  Hence when we run short on memory, we need to kick
      the AIL flushing into action to clean as many dirty objects as
      quickly as possible.  To implement this, sample the lsn of the log
      item at the head of the AIL and use that as the push target for the
      AIL flush.
      
      Further, we keep items in the AIL that are dirty that are not
      tracked any other way, so we can get objects sitting in the AIL that
      don't get written back until the AIL is pushed. Hence to get the
      filesystem to the idle state, we might need to push the AIL to flush
      out any remaining dirty objects sitting in the AIL. This requires
      the same push mechanism as the reclaim push.
      
      This patch also renames xfs_trans_ail_tail() to xfs_ail_min_lsn() to
      match the new xfs_ail_max_lsn() function introduced in this patch.
      Similarly for xfs_trans_ail_push -> xfs_ail_push.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      fd074841
    • D
      xfs: clean up code layout in xfs_trans_ail.c · cd4a3c50
      Dave Chinner 提交于
      This patch rearranges the location of functions in xfs_trans_ail.c
      to remove the need for forward declarations of those functions in
      preparation for adding new functions without the need for forward
      declarations.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      cd4a3c50
    • D
      xfs: convert the xfsaild threads to a workqueue · 0bf6a5bd
      Dave Chinner 提交于
      Similar to the xfssyncd, the per-filesystem xfsaild threads can be
      converted to a global workqueue and run periodically by delayed
      works. This makes sense for the AIL pushing because it uses
      variable timeouts depending on the work that needs to be done.
      
      By removing the xfsaild, we simplify the AIL pushing code and
      remove the need to spread the code to implement the threading
      and pushing across multiple files.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      0bf6a5bd
    • D
      xfs: introduce background inode reclaim work · a7b339f1
      Dave Chinner 提交于
      Background inode reclaim needs to run more frequently that the XFS
      syncd work is run as 30s is too long between optimal reclaim runs.
      Add a new periodic work item to the xfs syncd workqueue to run a
      fast, non-blocking inode reclaim scan.
      
      Background inode reclaim is kicked by the act of marking inodes for
      reclaim.  When an AG is first marked as having reclaimable inodes,
      the background reclaim work is kicked. It will continue to run
      periodically untill it detects that there are no more reclaimable
      inodes. It will be kicked again when the first inode is queued for
      reclaim.
      
      To ensure shrinker based inode reclaim throttles to the inode
      cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the
      background inode reclaim so that when we are low on memory we are
      trying to reclaim inodes as efficiently as possible. This kick shoul
      d not be necessary, but it will protect against failures to kick the
      background reclaim when inodes are first dirtied.
      
      To provide the rate throttling, make the shrinker pass do
      synchronous inode reclaim so that it blocks on inodes under IO. This
      means that the shrinker will reclaim inodes rather than just
      skipping over them, but it does not adversely affect the rate of
      reclaim because most dirty inodes are already under IO due to the
      background reclaim work the shrinker kicked.
      
      These two modifications solve one of the two OOM killer invocations
      Chris Mason reported recently when running a stress testing script.
      The particular workload trigger for the OOM killer invocation is
      where there are more threads than CPUs all unlinking files in an
      extremely memory constrained environment. Unlike other solutions,
      this one does not have a performance impact on performance when
      memory is not constrained or the number of concurrent threads
      operating is <= to the number of CPUs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      a7b339f1
    • D
      xfs: convert ENOSPC inode flushing to use new syncd workqueue · 89e4cb55
      Dave Chinner 提交于
      On of the problems with the current inode flush at ENOSPC is that we
      queue a flush per ENOSPC event, regardless of how many are already
      queued. Thi can result in    hundreds of queued flushes, most of
      which simply burn CPU scanned and do no real work. This simply slows
      down allocation at ENOSPC.
      
      We really only need one active flush at a time, and we can easily
      implement that via the new xfs_syncd_wq. All we need to do is queue
      a flush if one is not already active, then block waiting for the
      currently active flush to complete. The result is that we only ever
      have a single ENOSPC inode flush active at a time and this greatly
      reduces the overhead of ENOSPC processing.
      
      On my 2p test machine, this results in tests exercising ENOSPC
      conditions running significantly faster - 042 halves execution time,
      083 drops from 60s to 5s, etc - while not introducing test
      regressions.
      
      This allows us to remove the old xfssyncd threads and infrastructure
      as they are no longer used.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      89e4cb55
    • D
      xfs: introduce a xfssyncd workqueue · c6d09b66
      Dave Chinner 提交于
      All of the work xfssyncd does is background functionality. There is
      no need for a thread per filesystem to do this work - it can al be
      managed by a global workqueue now they manage concurrency
      effectively.
      
      Introduce a new gglobal xfssyncd workqueue, and convert the periodic
      work to use this new functionality. To do this, use a delayed work
      construct to schedule the next running of the periodic sync work
      for the filesystem. When the sync work is complete, queue a new
      delayed work for the next running of the sync work.
      
      For laptop mode, we wait on completion for the sync works, so ensure
      that the sync work queuing interface can flush and wait for work to
      complete to enable the work queue infrastructure to replace the
      current sequence number and wakeup that is used.
      
      Because the sync work does non-trivial amounts of work, mark the
      new work queue as CPU intensive.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      c6d09b66
    • D
      xfs: fix extent format buffer allocation size · e828776a
      Dave Chinner 提交于
      When formatting an inode item, we have to allocate a separate buffer
      to hold extents when there are delayed allocation extents on the
      inode and it is in extent format. The allocation size is derived
      from the in-core data fork representation, which accounts for
      delayed allocation extents, while the on-disk representation does
      not contain any delalloc extents.
      
      As a result of this mismatch, the allocated buffer can be far larger
      than needed to hold the real extent list which, due to the fact the
      inode is in extent format, is limited to the size of the literal
      area of the inode. However, we can have thousands of delalloc
      extents, resulting in an allocation size orders of magnitude larger
      than is needed to hold all the real extents.
      
      Fix this by limiting the size of the buffer being allocated to the
      size of the literal area of the inodes in the filesystem (i.e. the
      maximum size an inode fork can grow to).
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      e828776a
  8. 31 3月, 2011 1 次提交