1. 07 3月, 2014 1 次提交
  2. 03 3月, 2014 1 次提交
    • S
      GFS2: Clean up journal extent mapping · b50f227b
      Steven Whitehouse 提交于
      This patch fixes a long standing issue in mapping the journal
      extents. Most journals will consist of only a single extent,
      and although the cache took account of that by merging extents,
      it did not actually map large extents, but instead was doing a
      block by block mapping. Since the journal was only being mapped
      on mount, this was not normally noticeable.
      
      With the updated code, it is now possible to use the same extent
      mapping system during journal recovery (which will be added in a
      later patch). This will allow checking of the integrity of the
      journal before any reply of the journal content is attempted. For
      this reason the code is moving to bmap.c, since it will be used
      more widely in due course.
      
      An exercise left for the reader is to compare the new function
      gfs2_map_journal_extents() with gfs2_write_alloc_required()
      
      Additionally, should there be a failure, the error reporting is
      also updated to show more detail about what went wrong.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b50f227b
  3. 25 2月, 2014 2 次提交
    • S
      GFS2: Move log buffer accounting to transaction · 022ef4fe
      Steven Whitehouse 提交于
      Now we have a master transaction into which other transactions
      are merged, the accounting can be done using this master
      transaction. We no longer require the superblock fields which
      were being used for this function.
      
      In addition, this allows for a clean up in calc_reserved()
      making it rather easier understand. Also, by reducing the
      number of variables used to track the buffers being added
      and removed from the journal, a number of error checks are
      now no longer required.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      022ef4fe
    • S
      GFS2: Move log buffer lists into transaction · d69a3c65
      Steven Whitehouse 提交于
      Over time, we hope to be able to improve the concurrency available
      in the log code. This is one small step towards that, by moving
      the buffer lists from the super block, and into the transaction
      structure, so that each transaction builds its own buffer lists.
      
      At transaction commit time, the buffer lists are merged into
      the currently accumulating transaction. That transaction then
      is passed into the before and after commit functions at journal
      flush time. Thus there should be no change in overall behaviour
      yet.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d69a3c65
  4. 03 1月, 2014 2 次提交
    • S
      GFS2: Use only a single address space for rgrps · 70d4ee94
      Steven Whitehouse 提交于
      Prior to this patch, GFS2 had one address space for each rgrp,
      stored in the glock. This patch changes them to use a single
      address space in the super block. This therefore saves
      (sizeof(struct address_space) * nr_of_rgrps) bytes of memory
      and for large filesystems, that can be significant.
      
      It would be nice to be able to do something similar and merge
      the inode metadata address space into the same global
      address space. However, that is rather more complicated as the
      on-disk location doesn't have a 1:1 mapping with the inodes in
      general. So while it could be done, it will be a more complicated
      operation as it requires changing a lot more code paths.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      70d4ee94
    • B
      GFS2: Implement a "rgrp has no extents longer than X" scheme · 5ea5050c
      Bob Peterson 提交于
      With the preceding patch, we started accepting block reservations
      smaller than the ideal size, which requires a lot more parsing of the
      bitmaps. To reduce the amount of bitmap searching, this patch
      implements a scheme whereby each rgrp keeps track of the point
      at this multi-block reservations will fail.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5ea5050c
  5. 24 11月, 2013 1 次提交
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  6. 20 8月, 2013 1 次提交
  7. 19 6月, 2013 1 次提交
    • B
      GFS2: aggressively issue revokes in gfs2_log_flush · 5d054964
      Benjamin Marzinski 提交于
      This patch looks at all the outstanding blocks in all the transactions
      on the log, and moves the completed ones to the ail2 list.  Then it
      issues revokes for these blocks.  This will hopefully speed things up
      in situations where there is a lot of contention for glocks, especially
      if they are acquired serially.
      
      revoke_lo_before_commit will issue at most one log block's full of these
      preemptive revokes. The amount of reserved log space that
      gfs2_log_reserve() ignores has been incremented to allow for this extra
      block.
      
      This patch also consolidates the common revoke instructions into one
      function, gfs2_add_revoke().
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5d054964
  8. 05 6月, 2013 2 次提交
  9. 03 6月, 2013 1 次提交
  10. 24 5月, 2013 1 次提交
  11. 08 4月, 2013 1 次提交
    • B
      GFS2: replace gfs2_ail structure with gfs2_trans · 16ca9412
      Benjamin Marzinski 提交于
      In order to allow transactions and log flushes to happen at the same
      time, gfs2 needs to move the transaction accounting and active items
      list code into the gfs2_trans structure.  As a first step toward this,
      this patch removes the gfs2_ail structure, and handles the active items
      list in the gfs_trans structure.  This keeps gfs2 from allocating an ail
      structure on log flushes, and gives us a struture that can later be used
      to store the transaction accounting outside of the gfs2 superblock
      structure.
      
      With this patch, at the end of a transaction, gfs2 will add the
      gfs2_trans structure to the superblock if there is not one already.
      This structure now has the active items fields that were previously in
      gfs2_ail.  This is not necessary in the case where the transaction was
      simply used to add revokes, since these are never written outside of the
      journal, and thus, don't need an active items list.
      
      Also, in order to make sure that the transaction structure is not
      removed while it's still in use by gfs2_trans_end, unlocking the
      sd_log_flush_lock has to happen slightly later in ending the
      transaction.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      16ca9412
  12. 24 3月, 2013 1 次提交
    • K
      block: Add bio_end_sector() · f73a1c7d
      Kent Overstreet 提交于
      Just a little convenience macro - main reason to add it now is preparing
      for immutable bio vecs, it'll reduce the size of the patch that puts
      bi_sector/bi_size/bi_idx into a struct bvec_iter.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
      CC: Jiri Kosina <jkosina@suse.cz>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: dm-devel@redhat.com
      CC: Neil Brown <neilb@suse.de>
      CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
      CC: Heiko Carstens <heiko.carstens@de.ibm.com>
      CC: linux-s390@vger.kernel.org
      CC: Chris Mason <chris.mason@fusionio.com>
      CC: Steven Whitehouse <swhiteho@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      f73a1c7d
  13. 29 1月, 2013 2 次提交
  14. 07 11月, 2012 2 次提交
  15. 06 6月, 2012 1 次提交
  16. 02 5月, 2012 1 次提交
  17. 24 4月, 2012 5 次提交
    • S
      GFS2: Log code fixes · 144a4c2f
      Steven Whitehouse 提交于
      This patch removes a log lock from around atomic operation where
      it is not needed, removes an unused variable, and also changes
      a void pointer used incorrectly to a struct page pointer.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      144a4c2f
    • S
      GFS2: Remove bd_list_tr · c50b91c4
      Steven Whitehouse 提交于
      This is another clean up in the logging code. This per-transaction
      list was largely unused. Its main function was to ensure that the
      number of buffers in a transaction was correct, however that counter
      was only used to check the number of buffers in the bd_list_tr, plus
      an assert at the end of each transaction. With the assert now changed
      to use the calculated buffer counts, we can remove both bd_list_tr and
      its associated counter.
      
      This should make the code easier to understand as well as shrinking
      a couple of structures.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c50b91c4
    • S
      GFS2: Remove duplicate log code · dad30e90
      Steven Whitehouse 提交于
      The main part of this patch merges the two functions used to
      write metadata and data buffers to the log. Most of the code
      is common between the two functions, so this provides a nice
      clean up, and makes the code more readable.
      
      The gfs2_get_log_desc() function is also extended to take two more
      arguments, and thus avoid having to set the length and data1
      fields of this strucuture as a separate operation.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      dad30e90
    • S
      GFS2: Clean up log write code path · e8c92ed7
      Steven Whitehouse 提交于
      Prior to this patch, we have two ways of sending i/o to the log.
      One of those is used when we need to allocate both the data
      to be written itself and also a buffer head to submit it. This
      is done via sb_getblk and friends. This is used mostly for writing
      log headers.
      
      The other method is used when writing blocks which have some
      in-place counterpart. This is the case for all the metadata
      blocks which are journalled, and when journaled data is in use,
      for unescaped journalled data blocks.
      
      This patch replaces both of those two methods, and about half
      a dozen separate i/o submission points with a single i/o
      submission function. We also go direct to bio rather than
      using buffer heads, since this allows us to build i/o
      requests of the maximum size for the block device in
      question. It also reduces the memory required for flushing
      the log, which can be very useful in low memory situations.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e8c92ed7
    • S
      GFS2: Make gfs2_log_fake_buf() write the buffer too · 14e5f184
      Steven Whitehouse 提交于
      Since we always write the buffer directly after this function
      returns, we might as well merge it into here. This is a clean
      up in preparation for some further updates to the log code
      which are coming soon.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      14e5f184
  18. 20 3月, 2012 1 次提交
  19. 08 3月, 2012 1 次提交
    • S
      GFS2: Remove a __GFP_NOFAIL allocation · 75ca61c1
      Steven Whitehouse 提交于
      In order to ensure that we've got enough buffer heads for flushing
      the journal, the orignal code used __GFP_NOFAIL when performing
      this allocation. Here we dispense with that in favour of using a
      mempool. This should improve efficiency in low memory conditions
      since flushing the journal is a good way to get memory back, we
      don't want to be spinning, waiting on memory allocations. The
      buffers which are allocated via this mempool are fairly short lived,
      so that we'll recycle them pretty quickly.
      
      Although there are other memory allocations which occur during the
      journal flush process, this is the one which can potentially require
      the most memory, so the most important one to fix.
      
      The amount of memory reserved is a fixed amount, and we should not need
      to scale it when there are a greater number of filesystems in use.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      75ca61c1
  20. 29 2月, 2012 2 次提交
    • S
      GFS2: FITRIM ioctl support · 66fc061b
      Steven Whitehouse 提交于
      The FITRIM ioctl provides an alternative way to send discard requests to
      the underlying device. Using the discard mount option results in every
      freed block generating a discard request to the block device. This can
      be slow, since many block devices can only process discard requests of
      larger sizes, and also such operations can be time consuming.
      
      Rather than using the discard mount option, FITRIM allows a sweep of the
      filesystem on an occasional basis, and also to optionally avoid sending
      down discard requests for smaller regions.
      
      In GFS2 FITRIM will work at resource group granularity. There is a flag
      for each resource group which keeps track of which resource groups have
      been trimmed. This flag is reset whenever a deallocation occurs in the
      resource group, and set whenever a successful FITRIM of that resource
      group has taken place. This helps to reduce repeated discard requests
      for the same block ranges, again improving performance.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      66fc061b
    • S
      GFS2: Move two functions from log.c to lops.c · 47ac5537
      Steven Whitehouse 提交于
      gfs2_log_get_buf() and gfs2_log_fake_buf() are both used
      only in lops.c, so move them next to their callers and they
      can then become static.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      47ac5537
  21. 21 10月, 2011 2 次提交
    • S
      GFS2: Misc fixes · 891a8e93
      Steven Whitehouse 提交于
      Some items picked up through automated code analysis. A few bits
      of unreachable code and two unchecked return values.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      891a8e93
    • B
      GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme · 7c9ca621
      Bob Peterson 提交于
      Here is an update of Bob's original rbtree patch which, in addition, also
      resolves the rather strange ref counting that was being done relating to
      the bitmap blocks.
      
      Originally we had a dual system for journaling resource groups. The metadata
      blocks were journaled and also the rgrp itself was added to a list. The reason
      for adding the rgrp to the list in the journal was so that the "repolish
      clones" code could be run to update the free space, and potentially send any
      discard requests when the log was flushed. This was done by comparing the
      "cloned" bitmap with what had been written back on disk during the transaction
      commit.
      
      Due to this, there was a requirement to hang on to the rgrps' bitmap buffers
      until the journal had been flushed. For that reason, there was a rather
      complicated set up in the ->go_lock ->go_unlock functions for rgrps involving
      both a mutex and a spinlock (the ->sd_rindex_spin) to maintain a reference
      count on the buffers.
      
      However, the journal maintains a reference count on the buffers anyway, since
      they are being journaled as metadata buffers. So by moving the code which deals
      with the post-journal accounting for bitmap blocks to the metadata journaling
      code, we can entirely dispense with the rather strange buffer ref counting
      scheme and also the requirement to journal the rgrps.
      
      The net result of all this is that the ->sd_rindex_spin is left to do exactly
      one job, and that is to look after the rbtree or rgrps.
      
      This patch is designed to be a stepping stone towards using RCU for the rbtree
      of resource groups, however the reduction in the number of uses of the
      ->sd_rindex_spin is likely to have benefits for multi-threaded workloads,
      anyway.
      
      The patch retains ->go_lock and ->go_unlock for rgrps, however these maybe also
      be removed in future in favour of calling the functions directly where required
      in the code. That will allow locking of resource groups without needing to
      actually read them in - something that could be useful in speeding up statfs.
      
      In the mean time though it is valid to dereference ->bi_bh only when the rgrp
      is locked. This is basically the same rule as before, modulo the references not
      being valid until the following journal flush.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Cc: Benjamin Marzinski <bmarzins@redhat.com>
      7c9ca621
  22. 20 4月, 2011 2 次提交
    • S
      GFS2: Optimise glock lru and end of life inodes · f42ab085
      Steven Whitehouse 提交于
      The GLF_LRU flag introduced in the previous patch can be
      used to check if a glock is on the lru list when a new
      holder is queued and if so remove it, without having first
      to get the lru_lock.
      
      The main purpose of this patch however is to optimise the
      glocks left over when an inode at end of life is being
      evicted. Previously such glocks were left with the GLF_LFLUSH
      flag set, so that when reclaimed, each one required a log flush.
      This patch resets the GLF_LFLUSH flag when there is nothing
      left to flush thus preventing later log flushes as glocks are
      reused or demoted.
      
      In order to do this, we need to keep track of the number of
      revokes which are outstanding, and also to clear the GLF_LFLUSH
      bit after a log commit when only revokes have been processed.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f42ab085
    • S
      GFS2: Alter point of entry to glock lru list for glocks with an address_space · 29687a2a
      Steven Whitehouse 提交于
      Rather than allowing the glocks to be scheduled for possible
      reclaim as soon as they have exited the journal, this patch
      delays their entry to the list until the glocks in question
      are no longer in use.
      
      This means that we will rely on the vm for writeback of all
      dirty data and metadata from now on. When glocks are added
      to the lru list they should be freeable much faster since all
      the I/O required to free them should have already been completed.
      
      This should lead to much better I/O patterns under low memory
      conditions.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      29687a2a
  23. 14 3月, 2011 1 次提交
  24. 11 3月, 2011 1 次提交
    • D
      GFS2: introduce AIL lock · d6a079e8
      Dave Chinner 提交于
      The log lock is currently used to protect the AIL lists and
      the movements of buffers into and out of them. The lists
      are self contained and no log specific items outside the
      lists are accessed when starting or emptying the AIL lists.
      
      Hence the operation of the AIL does not require the protection
      of the log lock so split them out into a new AIL specific lock
      to reduce the amount of traffic on the log lock. This will
      also reduce the amount of serialisation that occurs when
      the gfs2_logd pushes on the AIL to move it forward.
      
      This reduces the impact of log pushing on sequential write
      throughput.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d6a079e8
  25. 10 3月, 2011 1 次提交
    • J
      block: kill off REQ_UNPLUG · 721a9602
      Jens Axboe 提交于
      With the plugging now being explicitly controlled by the
      submitter, callers need not pass down unplugging hints
      to the block layer. If they want to unplug, it's because they
      manually plugged on their own - in which case, they should just
      unplug at will.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      721a9602
  26. 21 1月, 2011 1 次提交
    • S
      GFS2: Use RCU for glock hash table · bc015cb8
      Steven Whitehouse 提交于
      This has a number of advantages:
      
       - Reduces contention on the hash table lock
       - Makes the code smaller and simpler
       - Should speed up glock dumps when under load
       - Removes ref count changing in examine_bucket
       - No longer need hash chain lock in glock_put() in common case
      
      There are some further changes which this enables and which
      we may do in the future. One is to look at using SLAB_RCU,
      and another is to look at using a per-cpu counter for the
      per-sb glock counter, since that is touched twice in the
      lifetime of each glock (but only used at umount time).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      bc015cb8
  27. 05 5月, 2010 1 次提交
    • B
      GFS2: Various gfs2_logd improvements · 5e687eac
      Benjamin Marzinski 提交于
      This patch contains various tweaks to how log flushes and active item writeback
      work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits
      for gfs2_logd to do the log flushing.  Multiple functions were rewritten to
      remove the need to call gfs2_log_lock(). Instead of using one test to see if
      gfs2_logd had work to do, there are now seperate tests to check if there
      are two many buffers in the incore log or if there are two many items on the
      active items list.
      
      This patch is a port of a patch Steve Whitehouse wrote about a year ago, with
      some minor changes.  Since gfs2_ail1_start always submits all the active items,
      it no longer needs to keep track of the first ai submitted, so this has been
      removed. In gfs2_log_reserve(), the order of the calls to
      prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has
      been switched.  If it called wake_up first there was a small window for a race,
      where logd could run and return before gfs2_log_reserve was ready to get woken
      up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve()
      would be left waiting for gfs2_logd to eventualy run because it timed out.
      Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times
      out, and flushes the log, can now be set on mount with ar_commit.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5e687eac
  28. 01 3月, 2010 1 次提交
    • D
      GFS2: ordered writes are backwards · e5884636
      Dave Chinner 提交于
      When we queue data buffers for ordered write, the buffers are added
      to the head of the ordered write list. When the log needs to push
      these buffers to disk, it also walks the list from the head. The
      result is that the the ordered buffers are submitted to disk in
      reverse order.
      
      For large writes, this means that whenever the log flushes large
      streams of reverse sequential order buffers are pushed down into the
      block layers. The elevators don't handle this particularly well, so
      IO rates tend to be significantly lower than if the IO was issued in
      ascending block order.
      
      Queue new ordered buffers to the tail of the ordered buffer list to
      ensure that IO is dispatched in the order it was submitted. This
      should significantly improve large sequential write speeds. On a
      disk capable of 85MB/s, speeds increase from 50MB/s to 65MB/s for
      noop and from 38MB/s to 50MB/s for cfq.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e5884636