1. 08 4月, 2011 11 次提交
    • C
      xfs: fix xfs_debug warnings · 957935dc
      Christoph Hellwig 提交于
      For a CONFIG_XFS_DEBUG=n build gcc complains about statements with no
      effect in xfs_debug:
      
      fs/xfs/quota/xfs_qm_syscalls.c: In function 'xfs_qm_scall_trunc_qfiles':
      fs/xfs/quota/xfs_qm_syscalls.c:291:3: warning: statement with no effect
      
      The reason for that is that the various new xfs message functions have a
      return value which is never used, and in case of the non-debug build
      xfs_debug the macro evaluates to a plain 0 which produces the above
      warnings.  This can be fixed by turning xfs_debug into an inline function
      instead of a macro, but in addition to that I've also changed all the
      message helpers to return void as we never use their return values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      957935dc
    • C
      xfs: fix variable set but not used warnings · ecb697c1
      Christoph Hellwig 提交于
      GCC 4.6 now warnings about variables set but not used.  Fix the trivially
      fixable warnings of this sort.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      ecb697c1
    • D
      xfs: convert log tail checking to a warning · da8a1a4a
      Dave Chinner 提交于
      On the Power platform, the log tail debug checks fire excessively
      causing the system to panic early in testing. The debug checks are
      known to be racy, though on x86_64 there is no evidence that they
      trigger at all.
      
      We want to keep the checks active on debug systems to alert us to
      problems with log space accounting, but we need to reduce the impact
      of a racy check on testing on the Power platform.
      
      As a result, convert the ASSERT conditions to warnings, and
      allow them to fire only once per filesystem mount. This will prevent
      false positives from interfering with testing, whilst still
      providing us with the indication that they may be a problem with log
      space accounting should that occur.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      da8a1a4a
    • D
      xfs: catch bad block numbers freeing extents. · be65b18a
      Dave Chinner 提交于
      A fuzzed filesystem crashed a kernel when freeing an extent with a
      block number beyond the end of the filesystem. Convert all the debug
      asserts in xfs_free_extent() to active checks so that we catch bad
      extents and return that the filesytsem is corrupted rather than
      crashing.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      be65b18a
    • D
      xfs: push the AIL from memory reclaim and periodic sync · fd074841
      Dave Chinner 提交于
      When we are short on memory, we want to expedite the cleaning of
      dirty objects.  Hence when we run short on memory, we need to kick
      the AIL flushing into action to clean as many dirty objects as
      quickly as possible.  To implement this, sample the lsn of the log
      item at the head of the AIL and use that as the push target for the
      AIL flush.
      
      Further, we keep items in the AIL that are dirty that are not
      tracked any other way, so we can get objects sitting in the AIL that
      don't get written back until the AIL is pushed. Hence to get the
      filesystem to the idle state, we might need to push the AIL to flush
      out any remaining dirty objects sitting in the AIL. This requires
      the same push mechanism as the reclaim push.
      
      This patch also renames xfs_trans_ail_tail() to xfs_ail_min_lsn() to
      match the new xfs_ail_max_lsn() function introduced in this patch.
      Similarly for xfs_trans_ail_push -> xfs_ail_push.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      fd074841
    • D
      xfs: clean up code layout in xfs_trans_ail.c · cd4a3c50
      Dave Chinner 提交于
      This patch rearranges the location of functions in xfs_trans_ail.c
      to remove the need for forward declarations of those functions in
      preparation for adding new functions without the need for forward
      declarations.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      cd4a3c50
    • D
      xfs: convert the xfsaild threads to a workqueue · 0bf6a5bd
      Dave Chinner 提交于
      Similar to the xfssyncd, the per-filesystem xfsaild threads can be
      converted to a global workqueue and run periodically by delayed
      works. This makes sense for the AIL pushing because it uses
      variable timeouts depending on the work that needs to be done.
      
      By removing the xfsaild, we simplify the AIL pushing code and
      remove the need to spread the code to implement the threading
      and pushing across multiple files.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      0bf6a5bd
    • D
      xfs: introduce background inode reclaim work · a7b339f1
      Dave Chinner 提交于
      Background inode reclaim needs to run more frequently that the XFS
      syncd work is run as 30s is too long between optimal reclaim runs.
      Add a new periodic work item to the xfs syncd workqueue to run a
      fast, non-blocking inode reclaim scan.
      
      Background inode reclaim is kicked by the act of marking inodes for
      reclaim.  When an AG is first marked as having reclaimable inodes,
      the background reclaim work is kicked. It will continue to run
      periodically untill it detects that there are no more reclaimable
      inodes. It will be kicked again when the first inode is queued for
      reclaim.
      
      To ensure shrinker based inode reclaim throttles to the inode
      cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the
      background inode reclaim so that when we are low on memory we are
      trying to reclaim inodes as efficiently as possible. This kick shoul
      d not be necessary, but it will protect against failures to kick the
      background reclaim when inodes are first dirtied.
      
      To provide the rate throttling, make the shrinker pass do
      synchronous inode reclaim so that it blocks on inodes under IO. This
      means that the shrinker will reclaim inodes rather than just
      skipping over them, but it does not adversely affect the rate of
      reclaim because most dirty inodes are already under IO due to the
      background reclaim work the shrinker kicked.
      
      These two modifications solve one of the two OOM killer invocations
      Chris Mason reported recently when running a stress testing script.
      The particular workload trigger for the OOM killer invocation is
      where there are more threads than CPUs all unlinking files in an
      extremely memory constrained environment. Unlike other solutions,
      this one does not have a performance impact on performance when
      memory is not constrained or the number of concurrent threads
      operating is <= to the number of CPUs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      a7b339f1
    • D
      xfs: convert ENOSPC inode flushing to use new syncd workqueue · 89e4cb55
      Dave Chinner 提交于
      On of the problems with the current inode flush at ENOSPC is that we
      queue a flush per ENOSPC event, regardless of how many are already
      queued. Thi can result in    hundreds of queued flushes, most of
      which simply burn CPU scanned and do no real work. This simply slows
      down allocation at ENOSPC.
      
      We really only need one active flush at a time, and we can easily
      implement that via the new xfs_syncd_wq. All we need to do is queue
      a flush if one is not already active, then block waiting for the
      currently active flush to complete. The result is that we only ever
      have a single ENOSPC inode flush active at a time and this greatly
      reduces the overhead of ENOSPC processing.
      
      On my 2p test machine, this results in tests exercising ENOSPC
      conditions running significantly faster - 042 halves execution time,
      083 drops from 60s to 5s, etc - while not introducing test
      regressions.
      
      This allows us to remove the old xfssyncd threads and infrastructure
      as they are no longer used.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      89e4cb55
    • D
      xfs: introduce a xfssyncd workqueue · c6d09b66
      Dave Chinner 提交于
      All of the work xfssyncd does is background functionality. There is
      no need for a thread per filesystem to do this work - it can al be
      managed by a global workqueue now they manage concurrency
      effectively.
      
      Introduce a new gglobal xfssyncd workqueue, and convert the periodic
      work to use this new functionality. To do this, use a delayed work
      construct to schedule the next running of the periodic sync work
      for the filesystem. When the sync work is complete, queue a new
      delayed work for the next running of the sync work.
      
      For laptop mode, we wait on completion for the sync works, so ensure
      that the sync work queuing interface can flush and wait for work to
      complete to enable the work queue infrastructure to replace the
      current sequence number and wakeup that is used.
      
      Because the sync work does non-trivial amounts of work, mark the
      new work queue as CPU intensive.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      c6d09b66
    • D
      xfs: fix extent format buffer allocation size · e828776a
      Dave Chinner 提交于
      When formatting an inode item, we have to allocate a separate buffer
      to hold extents when there are delayed allocation extents on the
      inode and it is in extent format. The allocation size is derived
      from the in-core data fork representation, which accounts for
      delayed allocation extents, while the on-disk representation does
      not contain any delalloc extents.
      
      As a result of this mismatch, the allocated buffer can be far larger
      than needed to hold the real extent list which, due to the fact the
      inode is in extent format, is limited to the size of the literal
      area of the inode. However, we can have thousands of delalloc
      extents, resulting in an allocation size orders of magnitude larger
      than is needed to hold all the real extents.
      
      Fix this by limiting the size of the buffer being allocated to the
      size of the literal area of the inodes in the filesystem (i.e. the
      maximum size an inode fork can grow to).
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      e828776a
  2. 31 3月, 2011 1 次提交
  3. 26 3月, 2011 6 次提交
    • D
      xfs: stop using the page cache to back the buffer cache · 0e6e847f
      Dave Chinner 提交于
      Now that the buffer cache has it's own LRU, we do not need to use
      the page cache to provide persistent caching and reclaim
      infrastructure. Convert the buffer cache to use alloc_pages()
      instead of the page cache. This will remove all the overhead of page
      cache management from setup and teardown of the buffers, as well as
      needing to mark pages accessed as we find buffers in the buffer
      cache.
      
      By avoiding the page cache, we also remove the need to keep state in
      the page_private(page) field for persistant storage across buffer
      free/buffer rebuild and so all that code can be removed. This also
      fixes the long-standing problem of not having enough bits in the
      page_private field to track all the state needed for a 512
      sector/64k page setup.
      
      It also removes the need for page locking during reads as the pages
      are unique to the buffer and nobody else will be attempting to
      access them.
      
      Finally, it removes the buftarg address space lock as a point of
      global contention on workloads that allocate and free buffers
      quickly such as when creating or removing large numbers of inodes in
      parallel. This remove the 16TB limit on filesystem size on 32 bit
      machines as the page index (32 bit) is no longer used for lookups
      of metadata buffers - the buffer cache is now solely indexed by disk
      address which is stored in a 64 bit field in the buffer.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      0e6e847f
    • D
      xfs: register the inode cache shrinker before quotachecks · 704b2907
      Dave Chinner 提交于
      During mount, we can do a quotacheck that involves a bulkstat pass
      on all inodes. If there are more inodes in the filesystem than can
      be held in memory, we require the inode cache shrinker to run to
      ensure that we don't run out of memory.
      
      Unfortunately, the inode cache shrinker is not registered until we
      get to the end of the superblock setup process, which is after a
      quotacheck is run if it is needed. Hence we need to register the
      inode cache shrinker earlier in the mount process so that we don't
      OOM during mount. This requires that we also initialise the syncd
      work before we register the shrinker, so we nee dto juggle that
      around as well.
      
      While there, make sure that we have set up the block sizes in the
      VFS superblock correctly before the quotacheck is run so that any
      inodes that are cached as a result of the quotacheck have their
      block size fields set up correctly.
      
      Cc: stable@kernel.org
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      704b2907
    • D
      xfs: xfs_trans_read_buf() should return an error on failure · 7401aafd
      Dave Chinner 提交于
      When inside a transaction and we fail to read a buffer,
      xfs_trans_read_buf returns a null buffer pointer and no error.
      xfs_do_da_buf() checks the error return, but not the buffer, and as
      a result this read failure condition causes a panic when it attempts
      to dereference the non-existant buffer.
      
      Make xfs_trans_read_buf() return the same error for this situation
      regardless of whether it is in a transaction or not. This means
      every caller does not need to check both the error return and the
      buffer before proceeding to use the buffer.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      7401aafd
    • D
      xfs: introduce inode cluster buffer trylocks for xfs_iflush · 1bfd8d04
      Dave Chinner 提交于
      There is an ABBA deadlock between synchronous inode flushing in
      xfs_reclaim_inode and xfs_icluster_free. xfs_icluster_free locks the
      buffer, then takes inode ilocks, whilst synchronous reclaim takes
      the ilock followed by the buffer lock in xfs_iflush().
      
      To avoid this deadlock, separate the inode cluster buffer locking
      semantics from the synchronous inode flush semantics, allowing
      callers to attempt to lock the buffer but still issue synchronous IO
      if it can get the buffer. This requires xfs_iflush() calls that
      currently use non-blocking semantics to pass SYNC_TRYLOCK rather
      than 0 as the flags parameter.
      
      This allows xfs_reclaim_inode to avoid the deadlock on the buffer
      lock and detect the failure so that it can drop the inode ilock and
      restart the reclaim attempt on the inode. This allows
      xfs_ifree_cluster to obtain the inode lock, mark the inode stale and
      release it and hence defuse the deadlock situation. It also has the
      pleasant side effect of avoiding IO in xfs_reclaim_inode when it
      tries to next reclaim the inode as it is now marked stale.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      1bfd8d04
    • D
      vmap: flush vmap aliases when mapping fails · a19fb380
      Dave Chinner 提交于
      On 32 bit systems, vmalloc space is limited and XFS can chew through
      it quickly as the vmalloc space is lazily freed. This can result in
      failure to map buffers, even when there is apparently large amounts
      of vmalloc space available. Hence, if we fail to map a buffer, purge
      the aliases that have not yet been freed to hopefuly free up enough
      vmalloc space to allow a retry to succeed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      a19fb380
    • D
      xfs: preallocation transactions do not need to be synchronous · 82878897
      Dave Chinner 提交于
      Preallocation and hole punch transactions are currently synchronous
      and this is causing performance problems in some cases. The
      transactions don't need to be synchronous as we don't need to
      guarantee the preallocation is persistent on disk until a
      fdatasync, fsync, sync operation occurs. If the file is opened
      O_SYNC or O_DATASYNC, only then should the transaction be issued
      synchronously.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      82878897
  4. 17 3月, 2011 1 次提交
  5. 14 3月, 2011 1 次提交
  6. 12 3月, 2011 1 次提交
  7. 10 3月, 2011 2 次提交
    • J
      block: kill off REQ_UNPLUG · 721a9602
      Jens Axboe 提交于
      With the plugging now being explicitly controlled by the
      submitter, callers need not pass down unplugging hints
      to the block layer. If they want to unplug, it's because they
      manually plugged on their own - in which case, they should just
      unplug at will.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      721a9602
    • J
      block: remove per-queue plugging · 7eaceacc
      Jens Axboe 提交于
      Code has been converted over to the new explicit on-stack plugging,
      and delay users have been converted to use the new API for that.
      So lets kill off the old plugging along with aops->sync_page().
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      7eaceacc
  8. 09 3月, 2011 2 次提交
  9. 07 3月, 2011 10 次提交
  10. 02 3月, 2011 3 次提交
  11. 23 2月, 2011 2 次提交