1. 31 3月, 2011 1 次提交
  2. 26 3月, 2011 2 次提交
    • D
      xfs: stop using the page cache to back the buffer cache · 0e6e847f
      Dave Chinner 提交于
      Now that the buffer cache has it's own LRU, we do not need to use
      the page cache to provide persistent caching and reclaim
      infrastructure. Convert the buffer cache to use alloc_pages()
      instead of the page cache. This will remove all the overhead of page
      cache management from setup and teardown of the buffers, as well as
      needing to mark pages accessed as we find buffers in the buffer
      cache.
      
      By avoiding the page cache, we also remove the need to keep state in
      the page_private(page) field for persistant storage across buffer
      free/buffer rebuild and so all that code can be removed. This also
      fixes the long-standing problem of not having enough bits in the
      page_private field to track all the state needed for a 512
      sector/64k page setup.
      
      It also removes the need for page locking during reads as the pages
      are unique to the buffer and nobody else will be attempting to
      access them.
      
      Finally, it removes the buftarg address space lock as a point of
      global contention on workloads that allocate and free buffers
      quickly such as when creating or removing large numbers of inodes in
      parallel. This remove the 16TB limit on filesystem size on 32 bit
      machines as the page index (32 bit) is no longer used for lookups
      of metadata buffers - the buffer cache is now solely indexed by disk
      address which is stored in a 64 bit field in the buffer.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      0e6e847f
    • D
      vmap: flush vmap aliases when mapping fails · a19fb380
      Dave Chinner 提交于
      On 32 bit systems, vmalloc space is limited and XFS can chew through
      it quickly as the vmalloc space is lazily freed. This can result in
      failure to map buffers, even when there is apparently large amounts
      of vmalloc space available. Hence, if we fail to map a buffer, purge
      the aliases that have not yet been freed to hopefuly free up enough
      vmalloc space to allow a retry to succeed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      a19fb380
  3. 10 3月, 2011 1 次提交
  4. 07 3月, 2011 1 次提交
  5. 01 2月, 2011 1 次提交
    • T
      xfs: convert to alloc_workqueue() · 83e75904
      Tejun Heo 提交于
      Convert from create[_singlethread]_workqueue() to alloc_workqueue().
      
      * xfsdatad_workqueue and xfsconvertd_workqueue are identity converted.
        Using higher concurrency limit might be useful but given the
        complexity of workqueue usage in xfs, proceeding cautiously seems
        better.
      
      * xfs_mru_reap_wq is converted to non-ordered workqueue with max
        concurrency of 1 as the work items don't require any specific
        ordering and already have proper synchronization.  It seems it was
        singlethreaded to save worker threads, which is no longer a concern.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: xfs-masters@oss.sgi.com
      Cc: Christoph Hellwig <hch@infradead.org>
      83e75904
  6. 12 1月, 2011 1 次提交
    • C
      xfs: fix error handling for synchronous writes · bfc60177
      Christoph Hellwig 提交于
      If we get an IO error on a synchronous superblock write, we attach an
      error release function to it so that when the last reference goes away
      the release function is called and the buffer is invalidated and
      unlocked. The buffer is left locked until the release function is
      called so that other concurrent users of the buffer will be locked out
      until the buffer error is fully processed.
      
      Unfortunately, for the superblock buffer the filesyetm itself holds a
      reference to the buffer which prevents the reference count from
      dropping to zero and the release function being called. As a result,
      once an IO error occurs on a sync write, the buffer will never be
      unlocked and all future attempts to lock the buffer will hang.
      
      To make matters worse, this problems is not unique to such buffers;
      if there is a concurrent _xfs_buf_find() running, the lookup will grab
      a reference to the buffer and then wait on the buffer lock, preventing
      the reference count from ever falling to zero and hence unlocking the
      buffer.
      
      As such, the whole b_relse function implementation is broken because it
      cannot rely on the buffer reference count falling to zero to unlock the
      errored buffer. The synchronous write error path is the only path that
      uses this callback - it is used to ensure that the synchronous waiter
      gets the buffer error before the error state is cleared from the buffer
      by the release function.
      
      Given that the only sychronous buffer writes now go through xfs_bwrite
      and the error path in question can only occur for a write of a dirty,
      logged buffer, we can move most of the b_relse processing to happen
      inline in xfs_buf_iodone_callbacks, just like a normal I/O completion.
      In addition to that we make sure the error is not cleared in
      xfs_buf_iodone_callbacks, so that xfs_bwrite can reliably check it.
      Given that xfs_bwrite keeps the buffer locked until it has waited for
      it and checked the error this allows to reliably propagate the error
      to the caller, and make sure that the buffer is reliably unlocked.
      
      Given that xfs_buf_iodone_callbacks was the only instance of the
      b_relse callback we can remove it entirely.
      
      Based on earlier patches by Dave Chinner and Ajeet Yadav.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reported-by: NAjeet Yadav <ajeet.yadav.77@gmail.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      bfc60177
  7. 02 12月, 2010 1 次提交
    • D
      xfs: add a lru to the XFS buffer cache · 430cbeb8
      Dave Chinner 提交于
      Introduce a per-buftarg LRU for memory reclaim to operate on. This
      is the last piece we need to put in place so that we can fully
      control the buffer lifecycle. This allows XFS to be responsibile for
      maintaining the working set of buffers under memory pressure instead
      of relying on the VM reclaim not to take pages we need out from
      underneath us.
      
      The implementation introduces a b_lru_ref counter into the buffer.
      This is currently set to 1 whenever the buffer is referenced and so is used to
      determine if the buffer should be added to the LRU or not when freed.
      Effectively it allows lazy LRU initialisation of the buffer so we do not need
      to touch the LRU list and locks in xfs_buf_find().
      
      Instead, when the buffer is being released and we drop the last
      reference to it, we check the b_lru_ref count and if it is none zero
      we re-add the buffer reference and add the inode to the LRU. The
      b_lru_ref counter is decremented by the shrinker, and whenever the
      shrinker comes across a buffer with a zero b_lru_ref counter, if
      released the LRU reference on the buffer. In the absence of a lookup
      race, this will result in the buffer being freed.
      
      This counting mechanism is used instead of a reference flag so that
      it is simple to re-introduce buffer-type specific reclaim reference
      counts to prioritise reclaim more effectively. We still have all
      those hooks in the XFS code, so this will provide the infrastructure
      to re-implement that functionality.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      430cbeb8
  8. 01 12月, 2010 1 次提交
    • D
      xfs: push stale, pinned buffers on trylock failures · 90810b9e
      Dave Chinner 提交于
      As reported by Nick Piggin, XFS is suffering from long pauses under
      highly concurrent workloads when hosted on ramdisks. The problem is
      that an inode buffer is stuck in the pinned state in memory and as a
      result either the inode buffer or one of the inodes within the
      buffer is stopping the tail of the log from being moved forward.
      
      The system remains in this state until a periodic log force issued
      by xfssyncd causes the buffer to be unpinned. The main problem is
      that these are stale buffers, and are hence held locked until the
      transaction/checkpoint that marked them state has been committed to
      disk. When the filesystem gets into this state, only the xfssyncd
      can cause the async transactions to be committed to disk and hence
      unpin the inode buffer.
      
      This problem was encountered when scaling the busy extent list, but
      only the blocking lock interface was fixed to solve the problem.
      Extend the same fix to the buffer trylock operations - if we fail to
      lock a pinned, stale buffer, then force the log immediately so that
      when the next attempt to lock it comes around, it will have been
      unpinned.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      90810b9e
  9. 30 11月, 2010 1 次提交
  10. 11 11月, 2010 1 次提交
  11. 26 10月, 2010 1 次提交
    • C
      fs: do not assign default i_ino in new_inode · 85fe4025
      Christoph Hellwig 提交于
      Instead of always assigning an increasing inode number in new_inode
      move the call to assign it into those callers that actually need it.
      For now callers that need it is estimated conservatively, that is
      the call is added to all filesystems that do not assign an i_ino
      by themselves.  For a few more filesystems we can avoid assigning
      any inode number given that they aren't user visible, and for others
      it could be done lazily when an inode number is actually needed,
      but that's left for later patches.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      85fe4025
  12. 19 10月, 2010 8 次提交
  13. 11 10月, 2010 1 次提交
    • T
      workqueue: add and use WQ_MEM_RECLAIM flag · 6370a6ad
      Tejun Heo 提交于
      Add WQ_MEM_RECLAIM flag which currently maps to WQ_RESCUER, mark
      WQ_RESCUER as internal and replace all external WQ_RESCUER usages to
      WQ_MEM_RECLAIM.
      
      This makes the API users express the intent of the workqueue instead
      of indicating the internal mechanism used to guarantee forward
      progress.  This is also to make it cleaner to add more semantics to
      WQ_MEM_RECLAIM.  For example, if deemed necessary, memory reclaim
      workqueues can be made highpri.
      
      This patch doesn't introduce any functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      6370a6ad
  14. 10 9月, 2010 2 次提交
  15. 02 9月, 2010 1 次提交
    • D
      xfs: improve buffer cache hash scalability · 9bc08a45
      Dave Chinner 提交于
      When doing large parallel file creates on a 16p machines, large amounts of
      time is being spent in _xfs_buf_find(). A system wide profile with perf top
      shows this:
      
                1134740.00 19.3% _xfs_buf_find
                 733142.00 12.5% __ticket_spin_lock
      
      The problem is that the hash contains 45,000 buffers, and the hash table width
      is only 256 buffers. That means we've got around 200 buffers per chain, and
      searching it is quite expensive. The hash table size needs to increase.
      
      Secondly, every time we do a lookup, we promote the buffer we find to the head
      of the hash chain. This is causing cachelines to be dirtied and causes
      invalidation of cachelines across all CPUs that may have walked the hash chain
      recently. hence every walk of the hash chain is effectively a cold cache walk.
      Remove the promotion to avoid this invalidation.
      
      The results are:
      
                1045043.00 21.2% __ticket_spin_lock
                 326184.00  6.6% _xfs_buf_find
      
      A 70% drop in the CPU usage when looking up buffers. Unfortunately that does
      not result in an increase in performance underthis workload as contention on
      the inode_lock soaks up most of the reduction in CPU usage.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      9bc08a45
  16. 27 7月, 2010 4 次提交
    • C
      xfs: kill the b_strat callback in xfs_buf · 939d723b
      Christoph Hellwig 提交于
      The b_strat callback is used by xfs_buf_iostrategy to perform additional
      checks before submitting a buffer.  It is used in xfs_bwrite and when
      writing out delayed buffers.  In xfs_bwrite it we can de-virtualize the
      call easily as b_strat is set a few lines above the call to
      xfs_buf_iostrategy.  For the delayed buffers the rationale is a bit
      more complicated:
      
       - there are three callers of xfs_buf_delwri_queue, which places buffers
         on the delwri list:
          (1) xfs_bdwrite - this sets up b_strat, so it's fine
          (2) xfs_buf_iorequest.  None of the callers can have XBF_DELWRI set:
      	- xlog_bdstrat is only used for log buffers, which are never delwri
      	- _xfs_buf_read explicitly clears the delwri flag
      	- xfs_buf_iodone_work retries log buffers only
      	- xfsbdstrat - only used for reads, superblock writes without the
      	  delwri flag, log I/O and file zeroing with explicitly allocated
      	  buffers.
      	- xfs_buf_iostrategy - only calls xfs_buf_iorequest if b_strat is
      	  not set
          (3) xfs_buf_unlock
      	- only puts the buffer on the delwri list if the DELWRI flag is
      	  already set.  The DELWRI flag is only ever set in xfs_bwrite,
      	  xfs_buf_iodone_callbacks, or xfs_trans_log_buf.  For
      	  xfs_buf_iodone_callbacks and xfs_trans_log_buf we require
      	  an initialized buf item, which means b_strat was set to
      	  xfs_bdstrat_cb in xfs_buf_item_init.
      
      Conclusion: we can just get rid of the callback and replace it with
      explicit calls to xfs_bdstrat_cb.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      939d723b
    • D
      xfs: don't block on buffer read errors · ec53d1db
      Dave Chinner 提交于
      xfs_buf_read() fails to detect dispatch errors before attempting to
      wait on sychronous IO. If there was an error, it will get stuck
      forever, waiting for an I/O that was never started. Make sure the
      error is detected correctly.
      
      Further, such a failure can leave locked pages in the page cache
      which will cause a later operation to hang on the page. Ensure that
      we correctly process pages in the buffers when we get a dispatch
      error.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ec53d1db
    • C
      xfs: simplify buffer pinning · 4d16e924
      Christoph Hellwig 提交于
      Get rid of the xfs_buf_pin/xfs_buf_unpin/xfs_buf_ispin helpers and opencode
      them in their only callers, just like we did for the inode pinning a while
      ago.  Also remove duplicate trace points - the bufitem tracepoints cover
      all the information that is present in a buffer tracepoint.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      4d16e924
    • C
      xfs: drop dmapi hooks · 288699fe
      Christoph Hellwig 提交于
      Dmapi support was never merged upstream, but we still have a lot of hooks
      bloating XFS for it, all over the fast pathes of the filesystem.
      
      This patch drops over 700 lines of dmapi overhead.  If we'll ever get HSM
      support in mainline at least the namespace events can be done much saner
      in the VFS instead of the individual filesystem, so it's not like this
      is much help for future work.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      288699fe
  17. 19 7月, 2010 1 次提交
    • D
      mm: add context argument to shrinker callback · 7f8275d0
      Dave Chinner 提交于
      The current shrinker implementation requires the registered callback
      to have global state to work from. This makes it difficult to shrink
      caches that are not global (e.g. per-filesystem caches). Pass the shrinker
      structure to the callback so that users can embed the shrinker structure
      in the context the shrinker needs to operate on and get back to it in the
      callback via container_of().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      7f8275d0
  18. 24 5月, 2010 1 次提交
    • D
      xfs: Improve scalability of busy extent tracking · ed3b4d6c
      Dave Chinner 提交于
      When we free a metadata extent, we record it in the per-AG busy
      extent array so that it is not re-used before the freeing
      transaction hits the disk. This array is fixed size, so when it
      overflows we make further allocation transactions synchronous
      because we cannot track more freed extents until those transactions
      hit the disk and are completed. Under heavy mixed allocation and
      freeing workloads with large log buffers, we can overflow this array
      quite easily.
      
      Further, the array is sparsely populated, which means that inserts
      need to search for a free slot, and array searches often have to
      search many more slots that are actually used to check all the
      busy extents. Quite inefficient, really.
      
      To enable this aspect of extent freeing to scale better, we need
      a structure that can grow dynamically. While in other areas of
      XFS we have used radix trees, the extents being freed are at random
      locations on disk so are better suited to being indexed by an rbtree.
      
      So, use a per-AG rbtree indexed by block number to track busy
      extents.  This incures a memory allocation when marking an extent
      busy, but should not occur too often in low memory situations. This
      should scale to an arbitrary number of extents so should not be a
      limitation for features such as in-memory aggregation of
      transactions.
      
      However, there are still situations where we can't avoid allocating
      busy extents (such as allocation from the AGFL). To minimise the
      overhead of such occurences, we need to avoid doing a synchronous
      log force while holding the AGF locked to ensure that the previous
      transactions are safely on disk before we use the extent. We can do
      this by marking the transaction doing the allocation as synchronous
      rather issuing a log force.
      
      Because of the locking involved and the ordering of transactions,
      the synchronous transaction provides the same guarantees as a
      synchronous log force because it ensures that all the prior
      transactions are already on disk when the synchronous transaction
      hits the disk. i.e. it preserves the free->allocate order of the
      extent correctly in recovery.
      
      By doing this, we avoid holding the AGF locked while log writes are
      in progress, hence reducing the length of time the lock is held and
      therefore we increase the rate at which we can allocate and free
      from the allocation group, thereby increasing overall throughput.
      
      The only problem with this approach is that when a metadata buffer is
      marked stale (e.g. a directory block is removed), then buffer remains
      pinned and locked until the log goes to disk. The issue here is that
      if that stale buffer is reallocated in a subsequent transaction, the
      attempt to lock that buffer in the transaction will hang waiting
      the log to go to disk to unlock and unpin the buffer. Hence if
      someone tries to lock a pinned, stale, locked buffer we need to
      push on the log to get it unlocked ASAP. Effectively we are trading
      off a guaranteed log force for a much less common trigger for log
      force to occur.
      
      Ideally we should not reallocate busy extents. That is a much more
      complex fix to the problem as it involves direct intervention in the
      allocation btree searches in many places. This is left to a future
      set of modifications.
      
      Finally, now that we track busy extents in allocated memory, we
      don't need the descriptors in the transaction structure to point to
      them. We can replace the complex busy chunk infrastructure with a
      simple linked list of busy extents. This allows us to remove a large
      chunk of code, making the overall change a net reduction in code
      size.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      ed3b4d6c
  19. 19 5月, 2010 2 次提交
    • C
      xfs: enforce synchronous writes in xfs_bwrite · 8c38366f
      Christoph Hellwig 提交于
      xfs_bwrite is used with the intention of synchronously writing out
      buffers, but currently it does not actually clear the async flag if
      that's left from previous writes but instead implements async
      behaviour if it finds it.  Remove the code handling asynchronous
      writes as we've got rid of those entirely outside of the log and
      delwri buffers, and make sure that we clear the async and read flags
      before writing the buffer.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      8c38366f
    • J
      xfs: add blockdev name to kthreads · e2a07812
      Jan Engelhardt 提交于
      This allows to see in `ps` and similar tools which kthreads are
      allotted to which block device/filesystem, similar to what jbd2
      does. As the process name is a fixed 16-char array, no extra
      space is needed in tasks.
      
        PID TTY      STAT   TIME COMMAND
          2 ?        S      0:00 [kthreadd]
        197 ?        S      0:00  \_ [jbd2/sda2-8]
        198 ?        S      0:00  \_ [ext4-dio-unwrit]
        204 ?        S      0:00  \_ [flush-8:0]
       2647 ?        S      0:00  \_ [xfs_mru_cache]
       2648 ?        S      0:00  \_ [xfslogd/0]
       2649 ?        S      0:00  \_ [xfsdatad/0]
       2650 ?        S      0:00  \_ [xfsconvertd/0]
       2651 ?        S      0:00  \_ [xfsbufd/ram0]
       2652 ?        S      0:00  \_ [xfsaild/ram0]
       2653 ?        S      0:00  \_ [xfssyncd/ram0]
      Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
      Reviewed-by: NDave Chinner <david@fromorbit.com>
      e2a07812
  20. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  21. 17 3月, 2010 2 次提交
    • A
      xfs: use scalable vmap API · 8a262e57
      Alex Elder 提交于
      Re-apply a commit that had been reverted due to regressions
      that have since been fixed.
      
          From 95f8e302 Mon Sep 17 00:00:00 2001
          From: Nick Piggin <npiggin@suse.de>
          Date: Tue, 6 Jan 2009 14:43:09 +1100
      
          Implement XFS's large buffer support with the new vmap APIs. See the vmap
          rewrite (db64fe02) for some numbers. The biggest improvement that comes from
          using the new APIs is avoiding the global KVA allocation lock on every call.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      
      Only modifications here were a minor reformat, plus making the patch
      apply given the new use of xfs_buf_is_vmapped().
      Modified-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      8a262e57
    • A
      xfs: remove old vmap cache · cd9640a7
      Alex Elder 提交于
      Re-apply a commit that had been reverted due to regressions
      that have since been fixed.
      
          Original commit: d2859751
          Author: Nick Piggin <npiggin@suse.de>
          Date: Tue, 6 Jan 2009 14:40:44 +1100
      
          XFS's vmap batching simply defers a number (up to 64) of vunmaps,
          and keeps track of them in a list. To purge the batch, it just goes
          through the list and calls vunamp on each one. This is pretty poor:
          a global TLB flush is generally still performed on each vunmap, with
          the most expensive parts of the operation being the broadcast IPIs
          and locking involved in the SMP callouts, and the locking involved
          in the vmap management -- none of these are avoided by just batching
          up the calls. I'm actually surprised it ever made much difference.
          (Now that the lazy vmap allocator is upstream, this description is
          not quite right, but the vunmap batching still doesn't seem to do
          much).
      
          Rip all this logic out of XFS completely. I will improve vmap
          performance and scalability directly in subsequent patch.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      
      The only change I made was to use the "new" xfs_buf_is_vmapped()
      function in a place it had been open-coded in the original.
      Modified-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      cd9640a7
  22. 06 2月, 2010 1 次提交
  23. 04 2月, 2010 1 次提交
  24. 26 1月, 2010 1 次提交
    • D
      xfs: Sort delayed write buffers before dispatch · 089716aa
      Dave Chinner 提交于
      Currently when the xfsbufd writes delayed write buffers, it pushes
      them to disk in the order they come off the delayed write list. If
      there are lots of buffers ѕpread widely over the disk, this results
      in overwhelming the elevator sort queues in the block layer and we
      end up losing the posibility of merging adjacent buffers to minimise
      the number of IOs.
      
      Use the new generic list_sort function to sort the delwri dispatch
      queue before issue to ensure that the buffers are pushed in the most
      friendly order possible to the lower layers.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      089716aa
  25. 02 2月, 2010 1 次提交
    • D
      xfs: Don't issue buffer IO direct from AIL push V2 · d808f617
      Dave Chinner 提交于
      All buffers logged into the AIL are marked as delayed write.
      When the AIL needs to push the buffer out, it issues an async write of the
      buffer. This means that IO patterns are dependent on the order of
      buffers in the AIL.
      
      Instead of flushing the buffer, promote the buffer in the delayed
      write list so that the next time the xfsbufd is run the buffer will
      be flushed by the xfsbufd. Return the state to the xfsaild that the
      buffer was promoted so that the xfsaild knows that it needs to cause
      the xfsbufd to run to flush the buffers that were promoted.
      
      Using the xfsbufd for issuing the IO allows us to dispatch all
      buffer IO from the one queue. This means that we can make much more
      enlightened decisions on what order to flush buffers to disk as
      we don't have multiple places issuing IO. Optimisations to xfsbufd
      will be in a future patch.
      
      Version 2
      - kill XFS_ITEM_FLUSHING as it is now unused.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      d808f617
  26. 22 1月, 2010 1 次提交