1. 19 8月, 2013 1 次提交
    • B
      GFS2: don't overrun reserved revokes · 1bc333f4
      Benjamin Marzinski 提交于
      When run during fsync, a gfs2_log_flush could happen between the
      time when gfs2_ail_flush checked the number of blocks to revoke,
      and when it actually started the transaction to do those revokes.
      This occassionally caused it to need more revokes than it reserved,
      causing gfs2 to crash.
      
      Instead of just reserving enough revokes to handle the blocks that
      currently need them, this patch makes gfs2_ail_flush reserve the
      maximum number of revokes it can, without increasing the total number
      of reserved log blocks. This patch also passes the number of reserved
      revokes to __gfs2_ail_flush() so that it doesn't go over its limit
      and cause a crash like we're seeing. Non-fsync calls to __gfs2_ail_flush
      will still cause a BUG() necessary revokes are skipped.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1bc333f4
  2. 19 6月, 2013 1 次提交
    • B
      GFS2: aggressively issue revokes in gfs2_log_flush · 5d054964
      Benjamin Marzinski 提交于
      This patch looks at all the outstanding blocks in all the transactions
      on the log, and moves the completed ones to the ail2 list.  Then it
      issues revokes for these blocks.  This will hopefully speed things up
      in situations where there is a lot of contention for glocks, especially
      if they are acquired serially.
      
      revoke_lo_before_commit will issue at most one log block's full of these
      preemptive revokes. The amount of reserved log space that
      gfs2_log_reserve() ignores has been incremented to allow for this extra
      block.
      
      This patch also consolidates the common revoke instructions into one
      function, gfs2_add_revoke().
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5d054964
  3. 10 4月, 2013 1 次提交
    • S
      GFS2: Add origin indicator to glock callbacks · 81ffbf65
      Steven Whitehouse 提交于
      This patch adds a bool indicating whether the demote
      request was originated locally or remotely. This is then
      used by the iopen ->go_callback() to make 100% sure that
      it will only respond to remote callbacks.
      
      Since ->evict_inode() uses GL_NOCACHE when it attempts to
      get an exclusive lock on the iopen lock, this may result
      in extra scheduling of the workqueue in case that the
      exclusive promotion request failed. This patch prevents
      that from happening.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      81ffbf65
  4. 13 2月, 2013 1 次提交
  5. 15 11月, 2012 1 次提交
  6. 07 11月, 2012 2 次提交
  7. 24 9月, 2012 1 次提交
  8. 08 5月, 2012 1 次提交
  9. 24 4月, 2012 1 次提交
    • S
      GFS2: Remove bd_list_tr · c50b91c4
      Steven Whitehouse 提交于
      This is another clean up in the logging code. This per-transaction
      list was largely unused. Its main function was to ensure that the
      number of buffers in a transaction was correct, however that counter
      was only used to check the number of buffers in the bd_list_tr, plus
      an assert at the end of each transaction. With the assert now changed
      to use the calculated buffer counts, we can remove both bd_list_tr and
      its associated counter.
      
      This should make the code easier to understand as well as shrinking
      a couple of structures.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c50b91c4
  10. 02 11月, 2011 1 次提交
  11. 21 10月, 2011 5 次提交
    • S
      GFS2: Fix AIL flush issue during fsync · b5b24d7a
      Steven Whitehouse 提交于
      Unfortunately, it is not enough to just ignore locked buffers during
      the AIL flush from fsync. We need to be able to ignore all buffers
      which are locked, dirty or pinned at this stage as they might have
      been added subsequent to the log flush earlier in the fsync function.
      
      In addition, this means that we no longer need to rely on i_mutex to
      keep out writes during fsync, so we can, as a side-effect, remove
      that protection too.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Tested-By: NAbhijith Das <adas@redhat.com>
      b5b24d7a
    • S
      GFS2: Make resource groups "append only" during life of fs · 8339ee54
      Steven Whitehouse 提交于
      Since we have ruled out supporting online filesystem shrink,
      it is possible to make the resource group list append only
      during the life of a super block. This gives several benefits:
      
      Firstly, we only need to read new rindex elements as they are added
      rather than needing to reread the whole rindex file each time one
      element is added.
      
      Secondly, the rindex glock can be held for much shorter periods of
      time, and is completely removed from the fast path for allocations.
      The lock is taken in shared mode only when updating the resource
      groups when the first allocation occurs, and after a grow has
      taken place.
      
      Thirdly, this results in a reduction in code size, and everything
      gets a lot simpler to understand in this area.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8339ee54
    • B
      GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme · 7c9ca621
      Bob Peterson 提交于
      Here is an update of Bob's original rbtree patch which, in addition, also
      resolves the rather strange ref counting that was being done relating to
      the bitmap blocks.
      
      Originally we had a dual system for journaling resource groups. The metadata
      blocks were journaled and also the rgrp itself was added to a list. The reason
      for adding the rgrp to the list in the journal was so that the "repolish
      clones" code could be run to update the free space, and potentially send any
      discard requests when the log was flushed. This was done by comparing the
      "cloned" bitmap with what had been written back on disk during the transaction
      commit.
      
      Due to this, there was a requirement to hang on to the rgrps' bitmap buffers
      until the journal had been flushed. For that reason, there was a rather
      complicated set up in the ->go_lock ->go_unlock functions for rgrps involving
      both a mutex and a spinlock (the ->sd_rindex_spin) to maintain a reference
      count on the buffers.
      
      However, the journal maintains a reference count on the buffers anyway, since
      they are being journaled as metadata buffers. So by moving the code which deals
      with the post-journal accounting for bitmap blocks to the metadata journaling
      code, we can entirely dispense with the rather strange buffer ref counting
      scheme and also the requirement to journal the rgrps.
      
      The net result of all this is that the ->sd_rindex_spin is left to do exactly
      one job, and that is to look after the rbtree or rgrps.
      
      This patch is designed to be a stepping stone towards using RCU for the rbtree
      of resource groups, however the reduction in the number of uses of the
      ->sd_rindex_spin is likely to have benefits for multi-threaded workloads,
      anyway.
      
      The patch retains ->go_lock and ->go_unlock for rgrps, however these maybe also
      be removed in future in favour of calling the functions directly where required
      in the code. That will allow locking of resource groups without needing to
      actually read them in - something that could be useful in speeding up statfs.
      
      In the mean time though it is valid to dereference ->bi_bh only when the rgrp
      is locked. This is basically the same rule as before, modulo the references not
      being valid until the following journal flush.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Cc: Benjamin Marzinski <bmarzins@redhat.com>
      7c9ca621
    • S
      GFS2: Fix bug trap and journaled data fsync · f1818529
      Steven Whitehouse 提交于
      Journaled data requires that a complete flush of all dirty data for
      the file is done, in order that the ail flush which comes after
      will succeed.
      
      Also the recently enhanced bug trap can trigger falsely in case
      an ail flush from fsync races with a page read. This updates the
      bug trap such that it will ignore buffers which are locked and
      only trigger on dirty and/or pinned buffers when the ail flush
      is run from fsync. The original bug trap is retained when ail
      flush is run from ->go_sync()
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f1818529
    • S
      GFS2: Fix bug-trap in ail flush code · 75549186
      Steven Whitehouse 提交于
      The assert was being tested under the wrong lock, a
      legacy of the original code. Also, if it does trigger,
      the resulting information was not always a lot of help.
      
      This moves the patch under the correct lock and also
      prints out more useful information in tacking down the
      source of the problem.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      75549186
  12. 15 7月, 2011 3 次提交
  13. 14 7月, 2011 1 次提交
    • S
      GFS2: Resolve inode eviction and ail list interaction bug · 380f7c65
      Steven Whitehouse 提交于
      This patch contains a few misc fixes which resolve a recently
      reported issue. This patch has been a real team effort and has
      received a lot of testing.
      
      The first issue is that the ail lock needs to be held over a few
      more operations. The lock thats added into gfs2_releasepage() may
      possibly be a candidate for replacing with RCU at some future
      point, but at this stage we've gone for the obvious fix.
      
      The second issue is that gfs2_write_inode() can end up calling
      a glock recursively when called from gfs2_evict_inode() via the
      syncing code, so it needs a guard added.
      
      The third issue is that we either need to not truncate the metadata
      pages of inodes which have zero link count, but which we cannot
      deallocate due to them still being in use by other nodes, or we need
      to ensure that those pages have all made it through the journal and
      ail lists first. This patch takes the former approach, but the
      latter has also been tested and there is nothing to choose between
      them performance-wise. So again, we could revise that decision
      in the future.
      
      Also, the inode eviction process is now better documented.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Tested-by: NBob Peterson <rpeterso@redhat.com>
      Tested-by: NAbhijith Das <adas@redhat.com>
      Reported-by: NBarry J. Marson <bmarson@redhat.com>
      Reported-by: NDavid Teigland <teigland@redhat.com>
      380f7c65
  14. 12 7月, 2011 1 次提交
    • B
      GFS2: force a log flush when invalidating the rindex glock · 1ce53368
      Benjamin Marzinski 提交于
      Right now, there is nothing that forces the log to get flushed when a node
      drops its rindex glock so that another node can grow the filesystem. If the
      log doesn't get flushed, GFS2 can corrupt the sd_log_le_rg list in the
      following way.
      
      A node puts an rgd on the list in rg_lo_add(), and then the rindex glock is
      dropped so the other node can grow the filesystem. When the node reacquires the
      rindex glock, that rgd gets deleted in clear_rgrpdi() before ever being
      removed from the list by gfs2_log_flush().
      
      This code simply forces a log flush when the rindex glock is invalidated,
      solving the problem.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1ce53368
  15. 09 5月, 2011 1 次提交
  16. 20 4月, 2011 1 次提交
    • S
      GFS2: Clean up fsync() · dba898b0
      Steven Whitehouse 提交于
      This patch is designed to clean up GFS2's fsync
      implementation and ensure that it really does get everything on
      disk. Since ->write_inode() has been updated, we can call that
      via the vfs library function sync_inode_metadata() and the only
      remaining thing that has to be done is to ensure that we get
      any revoke records in the log after the inode has been written back.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      dba898b0
  17. 18 4月, 2011 1 次提交
  18. 11 3月, 2011 1 次提交
    • D
      GFS2: introduce AIL lock · d6a079e8
      Dave Chinner 提交于
      The log lock is currently used to protect the AIL lists and
      the movements of buffers into and out of them. The lists
      are self contained and no log specific items outside the
      lists are accessed when starting or emptying the AIL lists.
      
      Hence the operation of the AIL does not require the protection
      of the log lock so split them out into a new AIL specific lock
      to reduce the amount of traffic on the log lock. This will
      also reduce the amount of serialisation that occurs when
      the gfs2_logd pushes on the AIL to move it forward.
      
      This reduces the impact of log pushing on sequential write
      throughput.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d6a079e8
  19. 21 1月, 2011 1 次提交
    • S
      GFS2: Use RCU for glock hash table · bc015cb8
      Steven Whitehouse 提交于
      This has a number of advantages:
      
       - Reduces contention on the hash table lock
       - Makes the code smaller and simpler
       - Should speed up glock dumps when under load
       - Removes ref count changing in examine_bucket
       - No longer need hash chain lock in glock_put() in common case
      
      There are some further changes which this enables and which
      we may do in the future. One is to look at using SLAB_RCU,
      and another is to look at using a per-cpu counter for the
      per-sb glock counter, since that is touched twice in the
      lifetime of each glock (but only used at umount time).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      bc015cb8
  20. 16 12月, 2010 1 次提交
    • S
      GFS2: Don't flush delete workqueue when releasing the transaction lock · 846f4045
      Steven Whitehouse 提交于
      There is no requirement to flush the delete workqueue before a
      gfs2 filesystem is suspended. The workqueue's work will just
      be suspended along with the rest of the tasks on the filesystem.
      
      The resolves a deadlock situation where the transaction lock's
      demotion code was trying to flush the delete workqueue while at
      the same time, the workqueue was waiting for the transaction
      lock.
      
      The delete workqueue is flushed by gfs2_make_fs_ro() already, so
      that umount/remount are correctly protected anyway.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      846f4045
  21. 06 10月, 2010 1 次提交
  22. 20 9月, 2010 1 次提交
  23. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  24. 01 3月, 2010 1 次提交
    • S
      GFS2: Metadata address space clean up · 009d8518
      Steven Whitehouse 提交于
      Since the start of GFS2, an "extra" inode has been used to store
      the metadata belonging to each inode. The only reason for using
      this inode was to have an extra address space, the other fields
      were unused. This means that the memory usage was rather inefficient.
      
      The reason for keeping each inode's metadata in a separate address
      space is that when glocks are requested on remote nodes, we need to
      be able to efficiently locate the data and metadata which relating
      to that glock (inode) in order to sync or sync and invalidate it
      (depending on the remotely requested lock mode).
      
      This patch adds a new type of glock, which has in addition to
      its normal fields, has an address space. This applies to all
      inode and rgrp glocks (but to no other glock types which remain
      as before). As a result, we no longer need to have the second
      inode.
      
      This results in three major improvements:
       1. A saving of approx 25% of memory used in caching inodes
       2. A removal of the circular dependency between inodes and glocks
       3. No confusion between "normal" and "metadata" inodes in super.c
      
      Although the first of these is the more immediately apparent, the
      second is just as important as it now enables a number of clean
      ups at umount time. Those will be the subject of future patches.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      009d8518
  25. 03 12月, 2009 1 次提交
  26. 30 7月, 2009 1 次提交
  27. 20 5月, 2009 1 次提交
    • S
      GFS2: Improve resource group error handling · 09010978
      Steven Whitehouse 提交于
      This patch improves the error handling in the case where we
      discover that the summary information in the resource group
      doesn't match the bitmap information while in the process of
      allocating blocks. Originally this resulted in a kernel bug,
      but this patch changes that so that we return -EIO and print
      some messages explaining what went wrong, and how to fix it.
      
      We also remember locally not to try and allocate from the
      same rgrp again, so that a subsequent allocation in a
      different rgrp should succeed.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      09010978
  28. 20 4月, 2009 1 次提交
  29. 24 3月, 2009 4 次提交
    • S
      GFS2: Clean up of glops.c · 6bac243f
      Steven Whitehouse 提交于
      This cleans up a number of bits of code mostly based in glops.c.
      A couple of simple functions have been merged into the callers
      to make it more obvious what is going on, the mysterious raising
      of i_writecount around the truncate_inode_pages() call has been
      removed. The meta_go_* operations have been renamed rgrp_go_*
      since that is the only lock type that they are used with.
      
      The unused argument of gfs2_read_sb has been removed. Also
      a bug has been fixed where a check for the rindex inode was
      in the wrong callback. More comments are added, and the
      debugging code is improved too.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6bac243f
    • S
      GFS2: Add a "demote a glock" interface to sysfs · 64d576ba
      Steven Whitehouse 提交于
      This adds a sysfs file called demote_rq to GFS2's
      per filesystem directory. Its possible to use this
      file to demote arbitrary glocks in exactly the same
      way as if a request had come in from a remote node.
      
      This is intended for testing issues relating to caching
      of data under glocks. Despite that, the interface is
      generic enough to send requests to any type of glock,
      but be careful as its not always safe to send an
      arbitrary message to an arbitrary glock. For that reason
      and to prevent DoS, this interface is restricted to root
      only.
      
      The messages look like this:
      
      <type>:<glocknumber> <mode>
      
      Example:
      
      echo -n "2:13324 EX" >/sys/fs/gfs2/unity:myfs/demote_rq
      
      Which means "please demote inode glock (type 2) number 13324 so that
      I can get an EX (exclusive) lock". The lock modes are those which
      would normally be sent by a remote node in its callback so if you
      want to unlock a glock, you use EX, to demote to shared, use SH or PR
      (depending on whether you like GFS2 or DLM lock modes better!).
      
      If the glock doesn't exist, you'll get -ENOENT returned. If the
      arguments don't make sense, you'll get -EINVAL returned.
      
      The plan is that this interface will be used in combination with
      the blktrace patch which I recently posted for comments although
      it is, of course, still useful in its own right.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      64d576ba
    • S
      GFS2: Fix deadlock on journal flush · d8348de0
      Steven Whitehouse 提交于
      This patch fixes a deadlock when the journal is flushed and there
      are dirty inodes other than the one which caused the journal flush.
      Originally the journal flushing code was trying to obtain the
      transaction glock while running the flush code for an inode glock.
      We no longer require the transaction glock at this point in time
      since we know that any attempt to get the transaction glock from
      another node will result in a journal flush. So if we are flushing
      the journal, we can be sure that the transaction lock is still
      cached from when the transaction was started.
      
      By inlining a version of gfs2_trans_begin() (minus the bit which
      gets the transaction glock) we can avoid the deadlock problems
      caused if there is a demote request queued up on the transaction
      glock.
      
      In addition I've also moved the umount rwsem so that it covers
      the glock workqueue, since it all demotions are done by this
      workqueue now. That fixes a bug on umount which I came across
      while fixing the original problem.
      Reported-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d8348de0
    • S
      GFS2: Merge lock_dlm module into GFS2 · f057f6cd
      Steven Whitehouse 提交于
      This is the big patch that I've been working on for some time
      now. There are many reasons for wanting to make this change
      such as:
       o Reducing overhead by eliminating duplicated fields between structures
       o Simplifcation of the code (reduces the code size by a fair bit)
       o The locking interface is now the DLM interface itself as proposed
         some time ago.
       o Fewer lookups of glocks when processing replies from the DLM
       o Fewer memory allocations/deallocations for each glock
       o Scope to do further optimisations in the future (but this patch is
         more than big enough for now!)
      
      Please note that (a) this patch relates to the lock_dlm module and
      not the DLM itself, that is still a separate module; and (b) that
      we retain the ability to build GFS2 as a standalone single node
      filesystem with out requiring the DLM.
      
      This patch needs a lot of testing, hence my keeping it I restarted
      my -git tree after the last merge window. That way, this has the maximum
      exposure before its merged. This is (modulo a few minor bug fixes) the
      same patch that I've been posting on and off the the last three months
      and its passed a number of different tests so far.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f057f6cd
  30. 05 1月, 2009 1 次提交
    • S
      GFS2: Kill two daemons with one patch · 97cc1025
      Steven Whitehouse 提交于
      This patch removes the two daemons, gfs2_scand and gfs2_glockd
      and replaces them with a shrinker which is called from the VM.
      
      The net result is that GFS2 responds better when there is memory
      pressure, since it shrinks the glock cache at the same rate
      as the VFS shrinks the dcache and icache. There are no longer
      any time based criteria for shrinking glocks, they are kept
      until such time as the VM asks for more memory and then we
      demote just as many glocks as required.
      
      There are potential future changes to this code, including the
      possibility of sorting the glocks which are to be written back
      into inode number order, to get a better I/O ordering. It would
      be very useful to have an elevator based workqueue implementation
      for this, as that would automatically deal with the read I/O cases
      at the same time.
      
      This patch is my answer to Andrew Morton's remark, made during
      the initial review of GFS2, asking why GFS2 needs so many kernel
      threads, the answer being that it doesn't :-) This patch is a
      net loss of about 200 lines of code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      97cc1025