1. 27 6月, 2008 1 次提交
    • S
      [GFS2] Clean up the glock core · 6802e340
      Steven Whitehouse 提交于
      This patch implements a number of cleanups to the core of the
      GFS2 glock code. As a result a lot of code is removed. It looks
      like a really big change, but actually a large part of this patch
      is either removing or moving existing code.
      
      There are some new bits too though, such as the new run_queue()
      function which is considerably streamlined. Highlights of this
      patch include:
      
       o Fixes a cluster coherency bug during SH -> EX lock conversions
       o Removes the "glmutex" code in favour of a single bit lock
       o Removes the ->go_xmote_bh() for inodes since it was duplicating
         ->go_lock()
       o We now only use the ->lm_lock() function for both locks and
         unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
       o The fast path is considerably shortly, giving performance gains
         especially with lock_nolock
       o The glock_workqueue is now used for all the callbacks from the DLM
         which allows us to simplify the lock_dlm module (see following patch)
       o The way is now open to make further changes such as eliminating the two
         threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
         scheme.
      
      This patch has undergone extensive testing with various test suites
      so it should be pretty stable by now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      6802e340
  2. 12 5月, 2008 1 次提交
  3. 31 3月, 2008 13 次提交
  4. 08 2月, 2008 2 次提交
  5. 25 1月, 2008 16 次提交
  6. 10 10月, 2007 5 次提交
    • S
      [GFS2] Clean up journaled data writing · 16615be1
      Steven Whitehouse 提交于
      This patch cleans up the code for writing journaled data into the log.
      It also removes the need to allocate a small "tag" structure for each
      block written into the log. Instead we just keep count of the outstanding
      I/O so that we can be sure that its all been written at the correct time.
      Another result of this patch is that a number of ll_rw_block() calls
      have become submit_bh() calls, closing some races at the same time.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      16615be1
    • S
      [GFS2] Replace revoke structure with bufdata structure · 82e86087
      Steven Whitehouse 提交于
      Both the revoke structure and the bufdata structure are quite similar.
      They are basically small tags which are put on lists. In addition to
      which the revoke structure is always allocated when there is a bufdata
      structure which is (or can be) freed. As such it should be possible to
      reduce the number of frees and allocations by using the same structure
      for both purposes.
      
      This patch is the first step along that path. It replaces existing uses
      of the revoke structure with the bufdata structure.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      82e86087
    • S
      [GFS2] Clean up ordered write code · d7b616e2
      Steven Whitehouse 提交于
      The following patch removes the ordered write processing from
      databuf_lo_before_commit() and moves it to log.c. This has the effect of
      greatly simplyfying databuf_lo_before_commit() and well as potentially
      making the ordered write code more efficient.
      
      As a side effect of this, its now possible to remove ordered buffers
      from the ordered buffer list at any time, so we now make use of this in
      invalidatepage and releasepage to ensure timely release of these
      buffers.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d7b616e2
    • B
      [GFS2] delay glock demote for a minimum hold time · c4f68a13
      Benjamin Marzinski 提交于
      When a lot of IO, with some distributed mmap IO, is run on a GFS2 filesystem in
      a cluster, it will deadlock. The reason is that do_no_page() will repeatedly
      call gfs2_sharewrite_nopage(), because each node keeps giving up the glock
      too early, and is forced to call unmap_mapping_range(). This bumps the
      mapping->truncate_count sequence count, forcing do_no_page() to retry. This
      patch institutes a minimum glock hold time a tenth a second.  This insures
      that even in heavy contention cases, the node has enough time to get some
      useful work done before it gives up the glock.
      
      A second issue is that when gfs2_glock_dq() is called from within a page fault
      to demote a lock, and the associated page needs to be written out, it will
      try to acqire a lock on it, but it has already been locked at a higher level.
      This patch puts makes gfs2_glock_dq() use the work queue as well, to avoid this
      issue. This is the same patch as Steve Whitehouse originally proposed to fix
      this issue, execpt that gfs2_glock_dq() now grabs a reference to the glock
      before it queues up the work on it.
      Signed-off-by: NBenjamin E. Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c4f68a13
    • S
      [GFS2] Reduce number of gfs2_scand processes to one · 8fbbfd21
      Steven Whitehouse 提交于
      We only need a single gfs2_scand process rather than the one
      per filesystem which we had previously. As a result the parameter
      determining the frequency of gfs2_scand runs becomes a module
      parameter rather than a mount parameter as it was before.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8fbbfd21
  7. 09 7月, 2007 2 次提交
    • R
      [GFS2] assertion failure after writing to journaled file, umount · 2332c443
      Robert Peterson 提交于
      This patch passes all my nasty tests that were causing the code to
      fail under one circumstance or another.  Here is a complete summary
      of all changes from today's git tree, in order of appearance:
      
      1. There are now separate variables for metadata buffer accounting.
      2. Variable sd_log_num_hdrs is no longer needed, since the header
         accounting is taken care of by the reserve/refund sequence.
      3. Fixed a tiny grammatical problem in a comment.
      4. Added a new function "calc_reserved" to calculate the reserved
         log space.  This isn't entirely necessary, but it has two benefits:
         First, it simplifies the gfs2_log_refund function greatly.
         Second, it allows for easier debugging because I could sprinkle the
         code with calls to this function to make sure the accounting is
         proper (by adding asserts and printks) at strategic point of the code.
      5. In log_pull_tail there apparently was a kludge to fix up the
         accounting based on a "pull" parameter.  The buffer accounting is
         now done properly, so the kludge was removed.
      6. File sync operations were making a call to gfs2_log_flush that
         writes another journal header.  Since that header was unplanned
         for (reserved) by the reserve/refund sequence, the free space had
         to be decremented so that when log_pull_tail gets called, the free
         space is be adjusted properly.  (Did I hear you call that a kludge?
         well, maybe, but a lot more justifiable than the one I removed).
      7. In the gfs2_log_shutdown code, it optionally syncs the log by
         specifying the PULL parameter to log_write_header.  I'm not sure
         this is necessary anymore.  It just seems to me there could be
         cases where shutdown is called while there are outstanding log
         buffers.
      8. In the (data)buf_lo_before_commit functions, I changed some offset
         values from being calculated on the fly to being constants.	That
         simplified some code and we might as well let the compiler do the
         calculation once rather than redoing those cycles at run time.
      9. This version has my rewritten databuf_lo_add function.
         This version is much more like its predecessor, buf_lo_add, which
         makes it easier to understand.  Again, this might not be necessary,
         but it seems as if this one works as well as the previous one,
         maybe even better, so I decided to leave it in.
      10. In databuf_lo_before_commit, a previous data corruption problem
         was caused by going off the end of the buffer.  The proper solution
         is to have the proper limit in place, rather than stopping earlier.
         (Thus my previous attempt to fix it is wrong).
         If you don't wrap the buffer, you're stopping too early and that
         causes more log buffer accounting problems.
      11. In lops.h there are two new (previously mentioned) constants for
         figuring out the data offset for the journal buffers.
      12. There are also two new functions, buf_limit and databuf_limit to
         calculate how many entries will fit in the buffer.
      13. In function gfs2_meta_wipe, it needs to distinguish between pinned
         metadata buffers and journaled data buffers for proper journal buffer
         accounting.	It can't use the JDATA gfs2_inode flag because it's
         sometimes passed the "real" inode and sometimes the "metadata
         inode" and the inode flags will be random bits in a metadata
         gfs2_inode.	It needs to base its decision on which was passed in.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2332c443
    • S
      [GFS2] Recovery for lost unlinked inodes · c8cdf479
      Steven Whitehouse 提交于
      Under certain circumstances its possible (though rather unlikely) that
      inodes which were unlinked by one node while still open on another might
      get "lost" in the sense that they don't get deallocated if the node
      which held the inode open crashed before it was unlinked.
      
      This patch adds the recovery code which allows automatic deallocation of
      the inode if its found during block allocation (the sensible time to
      look for such inodes since we are scanning the rgrp's bitmaps anyway at
      this time, so it adds no overhead to do this).
      
      Since the inode will have had its i_nlink set to zero, all we need to
      trigger recovery is a lookup and an iput(), and the normal deallocation
      code takes care of the rest.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c8cdf479