1. 08 4月, 2013 1 次提交
    • B
      GFS2: replace gfs2_ail structure with gfs2_trans · 16ca9412
      Benjamin Marzinski 提交于
      In order to allow transactions and log flushes to happen at the same
      time, gfs2 needs to move the transaction accounting and active items
      list code into the gfs2_trans structure.  As a first step toward this,
      this patch removes the gfs2_ail structure, and handles the active items
      list in the gfs_trans structure.  This keeps gfs2 from allocating an ail
      structure on log flushes, and gives us a struture that can later be used
      to store the transaction accounting outside of the gfs2 superblock
      structure.
      
      With this patch, at the end of a transaction, gfs2 will add the
      gfs2_trans structure to the superblock if there is not one already.
      This structure now has the active items fields that were previously in
      gfs2_ail.  This is not necessary in the case where the transaction was
      simply used to add revokes, since these are never written outside of the
      journal, and thus, don't need an active items list.
      
      Also, in order to make sure that the transaction structure is not
      removed while it's still in use by gfs2_trans_end, unlocking the
      sd_log_flush_lock has to happen slightly later in ending the
      transaction.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      16ca9412
  2. 29 1月, 2013 1 次提交
    • S
      GFS2: Use ->writepages for ordered writes · 45138990
      Steven Whitehouse 提交于
      Instead of using a list of buffers to write ahead of the journal
      flush, this now uses a list of inodes and calls ->writepages
      via filemap_fdatawrite() in order to achieve the same thing. For
      most use cases this results in a shorter ordered write list,
      as well as much larger i/os being issued.
      
      The ordered write list is sorted by inode number before writing
      in order to retain the disk block ordering between inodes as
      per the previous code.
      
      The previous ordered write code used to conflict in its assumptions
      about how to write out the disk blocks with mpage_writepages()
      so that with this updated version we can also use mpage_writepages()
      for GFS2's ordered write, writepages implementation. So we will
      also send larger i/os from writeback too.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      45138990
  3. 02 5月, 2012 1 次提交
  4. 24 4月, 2012 4 次提交
    • S
      GFS2: Log code fixes · 144a4c2f
      Steven Whitehouse 提交于
      This patch removes a log lock from around atomic operation where
      it is not needed, removes an unused variable, and also changes
      a void pointer used incorrectly to a struct page pointer.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      144a4c2f
    • S
      GFS2: Remove bd_list_tr · c50b91c4
      Steven Whitehouse 提交于
      This is another clean up in the logging code. This per-transaction
      list was largely unused. Its main function was to ensure that the
      number of buffers in a transaction was correct, however that counter
      was only used to check the number of buffers in the bd_list_tr, plus
      an assert at the end of each transaction. With the assert now changed
      to use the calculated buffer counts, we can remove both bd_list_tr and
      its associated counter.
      
      This should make the code easier to understand as well as shrinking
      a couple of structures.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c50b91c4
    • S
      GFS2: Clean up log write code path · e8c92ed7
      Steven Whitehouse 提交于
      Prior to this patch, we have two ways of sending i/o to the log.
      One of those is used when we need to allocate both the data
      to be written itself and also a buffer head to submit it. This
      is done via sb_getblk and friends. This is used mostly for writing
      log headers.
      
      The other method is used when writing blocks which have some
      in-place counterpart. This is the case for all the metadata
      blocks which are journalled, and when journaled data is in use,
      for unescaped journalled data blocks.
      
      This patch replaces both of those two methods, and about half
      a dozen separate i/o submission points with a single i/o
      submission function. We also go direct to bio rather than
      using buffer heads, since this allows us to build i/o
      requests of the maximum size for the block device in
      question. It also reduces the memory required for flushing
      the log, which can be very useful in low memory situations.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e8c92ed7
    • S
      GFS2: Drop "pull" argument from log_write_header() · fdb76a42
      Steven Whitehouse 提交于
      The "pull" argument to log_write_header() is only used
      for debug purposes and it is not really needed any more. There
      are other tests for this particular problem, so I think we can
      dispose of it in order to simplify the code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fdb76a42
  5. 09 3月, 2012 1 次提交
    • S
      GFS2: Clean up log flush header writing · 34cc1781
      Steven Whitehouse 提交于
      We already send both a pre and post flush to the block device
      when writing a journal header. There is no need to wait for
      the previous I/O specifically when we do this, unless we've
      turned "barriers" off.
      
      As a side effect, this also cleans up the code path for flushing
      the journal and makes it more readable.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      34cc1781
  6. 29 2月, 2012 3 次提交
  7. 22 11月, 2011 1 次提交
    • T
      freezer: unexport refrigerator() and update try_to_freeze() slightly · a0acae0e
      Tejun Heo 提交于
      There is no reason to export two functions for entering the
      refrigerator.  Calling refrigerator() instead of try_to_freeze()
      doesn't save anything noticeable or removes any race condition.
      
      * Rename refrigerator() to __refrigerator() and make it return bool
        indicating whether it scheduled out for freezing.
      
      * Update try_to_freeze() to return bool and relay the return value of
        __refrigerator() if freezing().
      
      * Convert all refrigerator() users to try_to_freeze().
      
      * Update documentation accordingly.
      
      * While at it, add might_sleep() to try_to_freeze().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Christoph Hellwig <hch@infradead.org>
      a0acae0e
  8. 08 11月, 2011 1 次提交
    • S
      GFS2: Fix up REQ flags · 20ed0535
      Steven Whitehouse 提交于
      Christoph has split up REQ_PRIO from REQ_META. That means that
      we can drop REQ_PRIO from places where is it not needed. I'm
      not at all sure that the combination WRITE_FLUSH_FUA | REQ_PRIO
      makes any kind of sense, anyway.
      
      In addition, I've added REQ_META to one place in the code where
      it was missing. REQ_PRIO has been left for read/writes triggered
      by glock acquisition and writeback only. We can adjust it again
      if required, but these are the most important points from a
      performance perspective.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      20ed0535
  9. 23 8月, 2011 1 次提交
  10. 14 7月, 2011 1 次提交
    • S
      GFS2: Resolve inode eviction and ail list interaction bug · 380f7c65
      Steven Whitehouse 提交于
      This patch contains a few misc fixes which resolve a recently
      reported issue. This patch has been a real team effort and has
      received a lot of testing.
      
      The first issue is that the ail lock needs to be held over a few
      more operations. The lock thats added into gfs2_releasepage() may
      possibly be a candidate for replacing with RCU at some future
      point, but at this stage we've gone for the obvious fix.
      
      The second issue is that gfs2_write_inode() can end up calling
      a glock recursively when called from gfs2_evict_inode() via the
      syncing code, so it needs a guard added.
      
      The third issue is that we either need to not truncate the metadata
      pages of inodes which have zero link count, but which we cannot
      deallocate due to them still being in use by other nodes, or we need
      to ensure that those pages have all made it through the journal and
      ail lists first. This patch takes the former approach, but the
      latter has also been tested and there is nothing to choose between
      them performance-wise. So again, we could revise that decision
      in the future.
      
      Also, the inode eviction process is now better documented.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Tested-by: NBob Peterson <rpeterso@redhat.com>
      Tested-by: NAbhijith Das <adas@redhat.com>
      Reported-by: NBarry J. Marson <bmarson@redhat.com>
      Reported-by: NDavid Teigland <teigland@redhat.com>
      380f7c65
  11. 22 5月, 2011 1 次提交
  12. 03 5月, 2011 1 次提交
  13. 20 4月, 2011 3 次提交
  14. 14 3月, 2011 1 次提交
  15. 11 3月, 2011 1 次提交
    • D
      GFS2: introduce AIL lock · d6a079e8
      Dave Chinner 提交于
      The log lock is currently used to protect the AIL lists and
      the movements of buffers into and out of them. The lists
      are self contained and no log specific items outside the
      lists are accessed when starting or emptying the AIL lists.
      
      Hence the operation of the AIL does not require the protection
      of the log lock so split them out into a new AIL specific lock
      to reduce the amount of traffic on the log lock. This will
      also reduce the amount of serialisation that occurs when
      the gfs2_logd pushes on the AIL to move it forward.
      
      This reduces the impact of log pushing on sequential write
      throughput.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d6a079e8
  16. 10 3月, 2011 1 次提交
    • J
      block: kill off REQ_UNPLUG · 721a9602
      Jens Axboe 提交于
      With the plugging now being explicitly controlled by the
      submitter, callers need not pass down unplugging hints
      to the block layer. If they want to unplug, it's because they
      manually plugged on their own - in which case, they should just
      unplug at will.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      721a9602
  17. 17 9月, 2010 1 次提交
  18. 10 9月, 2010 1 次提交
  19. 08 8月, 2010 2 次提交
    • C
      block: unify flags for struct bio and struct request · 7b6d91da
      Christoph Hellwig 提交于
      Remove the current bio flags and reuse the request flags for the bio, too.
      This allows to more easily trace the type of I/O from the filesystem
      down to the block driver.  There were two flags in the bio that were
      missing in the requests:  BIO_RW_UNPLUG and BIO_RW_AHEAD.  Also I've
      renamed two request flags that had a superflous RW in them.
      
      Note that the flags are in bio.h despite having the REQ_ name - as
      blkdev.h includes bio.h that is the only way to go for now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      7b6d91da
    • C
      block: BARRIER request should imply SYNC · 41f2df62
      Christoph Hellwig 提交于
      A barrier request should by defintion have priority in get_request
      and let the queue be unplugged immediately as it's blocking all forward
      progress due to the queue draining.
      
      Most filesystems already get this implicitly by the way how submit_bh
      treats the buffer_ordered flag, and gfs2 sets it explicitly.  But btrfs
      and XFS are still forgetting to set the flag, as is blkdev_issue_flush
      and some places in DM/MD.
      
      For XFS on metadata heavy workloads this gives a consistent speedup
      in the 2-3% range.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      41f2df62
  20. 21 5月, 2010 1 次提交
  21. 06 5月, 2010 1 次提交
    • S
      GFS2: Add some useful messages · 913a71d2
      Steven Whitehouse 提交于
      The following patch adds a message to indicate when barriers have been
      disabled due to a block device which doesn't support them. You could
      already tell this via the mount options in /proc/mounts, but all the
      other filesystems also log a message at the same time.
      
      Also, the same mechanisms are used to indicate when the lock
      demote interface has been used (only ever used for debugging)
      which is a request from our support team.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      913a71d2
  22. 05 5月, 2010 1 次提交
    • B
      GFS2: Various gfs2_logd improvements · 5e687eac
      Benjamin Marzinski 提交于
      This patch contains various tweaks to how log flushes and active item writeback
      work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits
      for gfs2_logd to do the log flushing.  Multiple functions were rewritten to
      remove the need to call gfs2_log_lock(). Instead of using one test to see if
      gfs2_logd had work to do, there are now seperate tests to check if there
      are two many buffers in the incore log or if there are two many items on the
      active items list.
      
      This patch is a port of a patch Steve Whitehouse wrote about a year ago, with
      some minor changes.  Since gfs2_ail1_start always submits all the active items,
      it no longer needs to keep track of the first ai submitted, so this has been
      removed. In gfs2_log_reserve(), the order of the calls to
      prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has
      been switched.  If it called wake_up first there was a small window for a race,
      where logd could run and return before gfs2_log_reserve was ready to get woken
      up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve()
      would be left waiting for gfs2_logd to eventualy run because it timed out.
      Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times
      out, and flushes the log, can now be set on mount with ar_commit.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5e687eac
  23. 11 3月, 2010 1 次提交
    • B
      GFS2: Allow the number of committed revokes to temporarily be negative · 2e95e3f6
      Benjamin Marzinski 提交于
      GFS2 tracks the number of revokes and unrevokes that are part of committed
      transactions via sd_log_commited_revoke. It is possible for one process to add
      revokes during its transaction, while another process unrevokes them during its
      transaction. If the second process finishes its transaction first,
      sd_log_commited_revoke will be decremented by the number of unrevokes that the
      second process did, without first being incremented by the number of revokes
      the first process did. This is fine, since all started transactions must be
      completed before the journal can be flushed.  However, sd_log_commited_revoke
      is an unsigned integer, and log_refund() causes an assertion failure if it
      would go negative at the end of a transaction.  This patch makes
      sd_log_commited_revoke a signed integer and allows it to go negative.
      __gfs2_log_flush() still checks that it mataches the actual number of revokes.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2e95e3f6
  24. 03 12月, 2009 1 次提交
    • S
      GFS2: Tag all metadata with jid · 0ab7d13f
      Steven Whitehouse 提交于
      There are two spare field in the header common to all GFS2
      metadata. One is just the right size to fit a journal id
      in it, and this patch updates the journal code so that each
      time a metadata block is modified, we tag it with the journal
      id of the node which is performing the modification.
      
      The reason for this is that it should make it much easier to
      debug issues which arise if we can tell which node was the
      last to modify a particular metadata block.
      
      Since the field is updated before the block is written into
      the journal, each journal should only contain metadata which
      is tagged with its own journal id. The one exception to this
      is the journal header block, which might have a different node's
      id in it, if that journal was recovered by another node in the
      cluster.
      
      Thus each journal will contain a record of which nodes recovered
      it, via the journal header.
      
      The other field in the metadata header could potentially be
      used to hold information about what kind of operation was
      performed, but for the time being we just zero it on each
      transaction so that if we use it for that in future, we'll
      know that the information (where it exists) is reliable.
      
      I did consider using the other field to hold the journal
      sequence number, however since in GFS2's journaling we write
      the modified data into the journal and not the original
      data, this gives no information as to what action caused the
      modification, so I think we can probably come up with a better
      use for those 64 bits in the future.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      0ab7d13f
  25. 12 6月, 2009 2 次提交
  26. 11 5月, 2009 1 次提交
  27. 24 3月, 2009 1 次提交
    • S
      GFS2: Merge lock_dlm module into GFS2 · f057f6cd
      Steven Whitehouse 提交于
      This is the big patch that I've been working on for some time
      now. There are many reasons for wanting to make this change
      such as:
       o Reducing overhead by eliminating duplicated fields between structures
       o Simplifcation of the code (reduces the code size by a fair bit)
       o The locking interface is now the DLM interface itself as proposed
         some time ago.
       o Fewer lookups of glocks when processing replies from the DLM
       o Fewer memory allocations/deallocations for each glock
       o Scope to do further optimisations in the future (but this patch is
         more than big enough for now!)
      
      Please note that (a) this patch relates to the lock_dlm module and
      not the DLM itself, that is still a separate module; and (b) that
      we retain the ability to build GFS2 as a standalone single node
      filesystem with out requiring the DLM.
      
      This patch needs a lot of testing, hence my keeping it I restarted
      my -git tree after the last merge window. That way, this has the maximum
      exposure before its merged. This is (modulo a few minor bug fixes) the
      same patch that I've been posting on and off the the last three months
      and its passed a number of different tests so far.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f057f6cd
  28. 26 9月, 2008 1 次提交
    • S
      GFS2: Support for I/O barriers · 254db57f
      Steven Whitehouse 提交于
      This patch adds barrier support to GFS2. There is not a lot of change
      really... we just add the barrier flag when we write journal header
      blocks. If the underlying device refuses to support them, we fall back
      to the previous way of doing things (wait for the I/O and hope) since
      there is nothing else we can do. There is no user configuration,
      barriers will always be on unless the device refuses to support them.
      This seems a reasonable solution to me since this is a correctness
      issue.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      254db57f
  29. 27 6月, 2008 1 次提交
  30. 18 4月, 2008 1 次提交
  31. 31 3月, 2008 1 次提交
    • B
      [GFS2] Only do lo_incore_commit once · d0109bfa
      Bob Peterson 提交于
      This patch is performance related.  When we're doing a log flush,
      I noticed we were calling buf_lo_incore_commit twice: once for
      data bufs and once for metadata bufs.  Since this is the same
      function and does the same thing in both cases, there should be
      no reason to call it twice.  Since we only need to call it once,
      we can also make it faster by removing it from the generic "lops"
      code and making it a stand-along static function.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d0109bfa