1. 24 3月, 2009 2 次提交
    • S
      GFS2: Fix deadlock on journal flush · d8348de0
      Steven Whitehouse 提交于
      This patch fixes a deadlock when the journal is flushed and there
      are dirty inodes other than the one which caused the journal flush.
      Originally the journal flushing code was trying to obtain the
      transaction glock while running the flush code for an inode glock.
      We no longer require the transaction glock at this point in time
      since we know that any attempt to get the transaction glock from
      another node will result in a journal flush. So if we are flushing
      the journal, we can be sure that the transaction lock is still
      cached from when the transaction was started.
      
      By inlining a version of gfs2_trans_begin() (minus the bit which
      gets the transaction glock) we can avoid the deadlock problems
      caused if there is a demote request queued up on the transaction
      glock.
      
      In addition I've also moved the umount rwsem so that it covers
      the glock workqueue, since it all demotions are done by this
      workqueue now. That fixes a bug on umount which I came across
      while fixing the original problem.
      Reported-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d8348de0
    • S
      GFS2: Merge lock_dlm module into GFS2 · f057f6cd
      Steven Whitehouse 提交于
      This is the big patch that I've been working on for some time
      now. There are many reasons for wanting to make this change
      such as:
       o Reducing overhead by eliminating duplicated fields between structures
       o Simplifcation of the code (reduces the code size by a fair bit)
       o The locking interface is now the DLM interface itself as proposed
         some time ago.
       o Fewer lookups of glocks when processing replies from the DLM
       o Fewer memory allocations/deallocations for each glock
       o Scope to do further optimisations in the future (but this patch is
         more than big enough for now!)
      
      Please note that (a) this patch relates to the lock_dlm module and
      not the DLM itself, that is still a separate module; and (b) that
      we retain the ability to build GFS2 as a standalone single node
      filesystem with out requiring the DLM.
      
      This patch needs a lot of testing, hence my keeping it I restarted
      my -git tree after the last merge window. That way, this has the maximum
      exposure before its merged. This is (modulo a few minor bug fixes) the
      same patch that I've been posting on and off the the last three months
      and its passed a number of different tests so far.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f057f6cd
  2. 05 1月, 2009 4 次提交
    • S
      GFS2: Kill two daemons with one patch · 97cc1025
      Steven Whitehouse 提交于
      This patch removes the two daemons, gfs2_scand and gfs2_glockd
      and replaces them with a shrinker which is called from the VM.
      
      The net result is that GFS2 responds better when there is memory
      pressure, since it shrinks the glock cache at the same rate
      as the VFS shrinks the dcache and icache. There are no longer
      any time based criteria for shrinking glocks, they are kept
      until such time as the VM asks for more memory and then we
      demote just as many glocks as required.
      
      There are potential future changes to this code, including the
      possibility of sorting the glocks which are to be written back
      into inode number order, to get a better I/O ordering. It would
      be very useful to have an elevator based workqueue implementation
      for this, as that would automatically deal with the read I/O cases
      at the same time.
      
      This patch is my answer to Andrew Morton's remark, made during
      the initial review of GFS2, asking why GFS2 needs so many kernel
      threads, the answer being that it doesn't :-) This patch is a
      net loss of about 200 lines of code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      97cc1025
    • S
      GFS2: Fix "truncate in progress" hang · 813e0c46
      Steven Whitehouse 提交于
      Following on from the recent clean up of gfs2_quotad, this patch moves
      the processing of "truncate in progress" inodes from the glock workqueue
      into gfs2_quotad. This fixes a hang due to the "truncate in progress"
      processing requiring glocks in order to complete.
      
      It might seem odd to use gfs2_quotad for this particular item, but
      we have to use a pre-existing thread since creating a thread implies
      a GFP_KERNEL memory allocation which is not allowed from the glock
      workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd
      may deadlock if used for this operation. gfs2_scand and gfs2_glockd are
      both scheduled for removal at some (hopefully not too distant) future
      point. That leaves only gfs2_quotad whose workload is generally fairly
      light and is easily adapted for this extra task.
      
      Also, as a result of this change, it opens the way for a future patch to
      make the reading of the inode's information asynchronous with respect to
      the glock workqueue, which is another improvement that has been on the list
      for some time now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      813e0c46
    • S
      GFS2: Add more detail to debugfs glock dumps · fa75cedc
      Steven Whitehouse 提交于
      Although the glock dumps print quite a lot of information about
      the glocks themselves, there are more things which can be
      usefully added to the dump realting to the objects themselves.
      
      This patch adds a few more fields to the inode and resource
      group lines, which should be useful for debugging.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fa75cedc
    • S
      GFS2: Banish struct gfs2_dinode_host · 383f01fb
      Steven Whitehouse 提交于
      The final field in gfs2_dinode_host was the i_flags field. Thats
      renamed to i_diskflags in order to avoid confusion with the existing
      inode flags, and moved into the inode proper at a suitable location
      to avoid creating a "hole".
      
      At that point struct gfs2_dinode_host is no longer needed and as
      promised (quite some time ago!) it can now be removed completely.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      383f01fb
  3. 27 6月, 2008 1 次提交
    • S
      [GFS2] Clean up the glock core · 6802e340
      Steven Whitehouse 提交于
      This patch implements a number of cleanups to the core of the
      GFS2 glock code. As a result a lot of code is removed. It looks
      like a really big change, but actually a large part of this patch
      is either removing or moving existing code.
      
      There are some new bits too though, such as the new run_queue()
      function which is considerably streamlined. Highlights of this
      patch include:
      
       o Fixes a cluster coherency bug during SH -> EX lock conversions
       o Removes the "glmutex" code in favour of a single bit lock
       o Removes the ->go_xmote_bh() for inodes since it was duplicating
         ->go_lock()
       o We now only use the ->lm_lock() function for both locks and
         unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
       o The fast path is considerably shortly, giving performance gains
         especially with lock_nolock
       o The glock_workqueue is now used for all the callbacks from the DLM
         which allows us to simplify the lock_dlm module (see following patch)
       o The way is now open to make further changes such as eliminating the two
         threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
         scheme.
      
      This patch has undergone extensive testing with various test suites
      so it should be pretty stable by now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      6802e340
  4. 12 5月, 2008 1 次提交
  5. 31 3月, 2008 1 次提交
  6. 25 1月, 2008 3 次提交
    • S
      [GFS2] Reorder writeback for glock sync · 3042a2cc
      Steven Whitehouse 提交于
      Previously we were doing (write data, wait for data, write metadata, wait
      for metadata). After this patch we so (write metadata, write data, wait for
      data, wait for metadata) which should be more efficient.
      
      Also I noticed that the drop_bh and xmote_bh functions were almost
      identical. In fact the only difference was a single test, and that
      test is such that in the drop_bh case, it would always evaluate to
      the correct result. As such we can use the xmote_bh functions in
      all the places where we were using the drop_bh function and remove
      the drop_bh functions.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3042a2cc
    • S
      [GFS2] Remove useless i_cache from inodes · f91a0d3e
      Steven Whitehouse 提交于
      The i_cache was designed to keep references to the indirect blocks
      used during block mapping so that they didn't have to be looked
      up continually. The idea failed because there are too many places
      where the i_cache needs to be freed, and this has in the past been
      the cause of many bugs.
      
      In addition there was no performance benefit being gained since the
      disk blocks in question were cached anyway. So this patch removes
      it in order to simplify the code to prepare for other changes which
      would otherwise have had to add further support for this feature.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f91a0d3e
    • S
      [GFS2] Use ->page_mkwrite() for mmap() · 3cc3f710
      Steven Whitehouse 提交于
      This cleans up the mmap() code path for GFS2 by implementing the
      page_mkwrite function for GFS2. We are thus able to use the
      generic filemap_fault function for our ->fault() implementation.
      
      This now means that shared writable mappings will be much more
      efficiently shared across the cluster if there is a reasonable
      proportion of read activity (the greater proportion, the better).
      
      As a side effect, it also reduces the size of the code, removes
      special cases from readpage and readpages, and makes the code
      path easier to follow.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3cc3f710
  7. 10 10月, 2007 4 次提交
    • S
      [GFS2] Clean up gfs2_trans_add_revoke() · 1ad38c43
      Steven Whitehouse 提交于
      The following alters gfs2_trans_add_revoke() to take a struct
      gfs2_bufdata as an argument. This eliminates the memory allocation which
      was previously required by making use of the already existing struct
      gfs2_bufdata. It makes some sanity checks to ensure that the
      gfs2_bufdata has been removed from all the lists before its recycled as
      a revoke structure. This saves one memory allocation and one free per
      revoke structure.
      
      Also as a result, and to simplify the locking, since there is no longer
      any blocking code in gfs2_trans_add_revoke() we must hold the log lock
      whenever this function is called. This reduces the amount of times we
      take and unlock the log lock.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1ad38c43
    • S
      [GFS2] Introduce gfs2_remove_from_ail · 1e1a3d03
      Steven Whitehouse 提交于
      This collects together the operations required to remove a gfs2_bufdata
      from the ail lists. Its only called from two places to start with, but
      expect to see more of this function in future.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1e1a3d03
    • B
      [GFS2] delay glock demote for a minimum hold time · c4f68a13
      Benjamin Marzinski 提交于
      When a lot of IO, with some distributed mmap IO, is run on a GFS2 filesystem in
      a cluster, it will deadlock. The reason is that do_no_page() will repeatedly
      call gfs2_sharewrite_nopage(), because each node keeps giving up the glock
      too early, and is forced to call unmap_mapping_range(). This bumps the
      mapping->truncate_count sequence count, forcing do_no_page() to retry. This
      patch institutes a minimum glock hold time a tenth a second.  This insures
      that even in heavy contention cases, the node has enough time to get some
      useful work done before it gives up the glock.
      
      A second issue is that when gfs2_glock_dq() is called from within a page fault
      to demote a lock, and the associated page needs to be written out, it will
      try to acqire a lock on it, but it has already been locked at a higher level.
      This patch puts makes gfs2_glock_dq() use the work queue as well, to avoid this
      issue. This is the same patch as Steve Whitehouse originally proposed to fix
      this issue, execpt that gfs2_glock_dq() now grabs a reference to the glock
      before it queues up the work on it.
      Signed-off-by: NBenjamin E. Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c4f68a13
    • S
      [GFS2] Clean up invalidatepage/releasepage · bb3b0e3d
      Steven Whitehouse 提交于
      This patch fixes some bugs relating to journaled data files by cleaning
      up the gfs2_invalidatepage() and gfs2_releasepage() functions. We now
      never block during gfs2_releasepage(), instead we always either release
      or refuse to release depending on the status of the buffers.
      
      This fixes Red Hat bugzillas #248969 and #252392.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      bb3b0e3d
  8. 09 7月, 2007 1 次提交
    • B
      [GFS2] flush the glock completely in inode_go_sync · b524fe64
      Benjamin Marzinski 提交于
      Fix for bz #231910
      When filemap_fdatawrite() is called on the inode mapping in data=ordered mode,
      it will add the glock to the log. In inode_go_sync(), if you do the
      gfs2_log_flush() before this, after the filemap_fdatawrite() call, the glock
      and its associated data buffers will be on the log again. This means you can
      demote a lock from exclusive, without having it flushed from the log. The
      attached patch simply moves the gfs2_log_flush up to after the
      filemap_fdatawrite() call.
      
      Originally, I tried moving the gfs2_log_flush to after gfs2_meta_sync(), but
      that caused me to trip the following assert.
      
      GFS2: fsid=cypher-36:test.0: fatal: assertion "!buffer_busy(bh)" failed
      GFS2: fsid=cypher-36:test.0:   function = gfs2_ail_empty_gl, file = fs/gfs2/glops.c, line = 61
      
      It appears that gfs2_log_flush() puts some of the glocks buffers in the busy
      state and the filemap_fdatawrite() call is necessary to flush them. This makes
      me worry slightly that a related problem could happen because of moving the
      gfs2_log_flush() after the initial filemap_fdatawrite(), but I assume that
      gfs2_ail_empty_gl() would catch that case as well.
      Signed-off-by: NBenjamin E. Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b524fe64
  9. 09 5月, 2007 1 次提交
  10. 08 3月, 2007 2 次提交
  11. 15 2月, 2007 1 次提交
    • T
      [PATCH] remove many unneeded #includes of sched.h · cd354f1a
      Tim Schmielau 提交于
      After Al Viro (finally) succeeded in removing the sched.h #include in module.h
      recently, it makes sense again to remove other superfluous sched.h includes.
      There are quite a lot of files which include it but don't actually need
      anything defined in there.  Presumably these includes were once needed for
      macros that used to live in sched.h, but moved to other header files in the
      course of cleaning it up.
      
      To ease the pain, this time I did not fiddle with any header files and only
      removed #includes from .c-files, which tend to cause less trouble.
      
      Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
      arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
      allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
      configs in arch/arm/configs on arm.  I also checked that no new warnings were
      introduced by the patch (actually, some warnings are removed that were emitted
      by unnecessarily included header files).
      Signed-off-by: NTim Schmielau <tim@physik3.uni-rostock.de>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd354f1a
  12. 06 2月, 2007 3 次提交
    • S
      [GFS2] Tidy up glops calls · b5d32bea
      Steven Whitehouse 提交于
      This patch doesn't make any changes to the ordering of the various
      operations related to glocking, but it does tidy up the calls to the
      glops.c functions to make the structure more obvious.
      
      The two functions: gfs2_glock_xmote_th() and gfs2_glock_drop_th() can be
      made static within glock.c since they are called by every set of glock
      operations. The xmote_th and drop_th glock operations are then made
      conditional upon those two routines existing and called from the
      previously mentioned functions in glock.c respectively.
      
      Also it can be seen that the go_sync operation isn't needed since it can
      easily be replaced by calls to xmote_bh and drop_bh respectively. This
      results in no longer (confusingly) calling back into routines in glock.c
      from glops.c and also reducing the glock operations by one member.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b5d32bea
    • S
      [GFS2] Remove local exclusive glock mode · 1c0f4872
      Steven Whitehouse 提交于
      Here is a patch for GFS2 to remove the local exclusive flag. In
      the places it was used, mutex's are always held earlier in the
      call path, so it appears redundant in the LM_ST_SHARED case.
      
      Also, the GFS2 holders were setting local exclusive in any case where
      the requested lock was LM_ST_EXCLUSIVE. So the other places in the glock
      code where the flag was tested have been replaced with tests for the
      lock state being LM_ST_EXCLUSIVE in order to ensure the logic is the
      same as before (i.e. LM_ST_EXCLUSIVE is always locally exclusive as well
      as globally exclusive).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1c0f4872
    • S
      [GFS2] Remove the "greedy" function from glock.[ch] · e5dab552
      Steven Whitehouse 提交于
      The "greedy" code was an attempt to retain glocks for a minimum length
      of time when they relate to mmap()ed files. The current implementation
      of this feature is not, however, ideal in that it required allocating
      memory in order to do this and its overly complicated.
      
      It also misses the mark by ignoring the other I/O operations which are
      just as likely to suffer from the same problem. So the plan is to remove
      this now and then add the functionality back as part of the glock state
      machine at a later date (and thus take into account all the possible
      users of this feature)
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e5dab552
  13. 30 11月, 2006 6 次提交
    • S
      [GFS2] Fix journal flush problem · b004157a
      Steven Whitehouse 提交于
      This fixes a bug which resulted in poor performance due to flushing
      the journal too often. The code path in question was via the inode_go_sync()
      function in glops.c. The solution is not to flush the journal immediately
      when inodes are ejected from memory, but batch up the work for glockd to
      deal with later on. This means that glocks may now live on beyond the end of
      the lifetime of their inodes (but not very much longer in the normal case).
      
      Also fixed in this patch is a bug (which was hidden by the bug mentioned above) in
      calculation of the number of free journal blocks.
      
      The gfs2_logd process has been altered to be more responsive to the journal
      filling up. We now wake it up when the number of uncommitted journal blocks
      has reached the threshold level rather than trying to flush directly at the
      end of each transaction. This again means doing fewer, but larger, log
      flushes in general.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b004157a
    • S
      [GFS2] Simplify glops functions · 1a14d3a6
      Steven Whitehouse 提交于
      The go_sync callback took two flags, but one of them was set on every
      call, so this patch removes once of the flags and makes the previously
      conditional operations (on this flag), unconditional.
      
      The go_inval callback took three flags, each of which was set on every
      call to it. This patch removes the flags and makes the operations
      unconditional, which makes the logic rather more obvious.
      
      Two now unused flags are also removed from incore.h.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1a14d3a6
    • S
      [GFS2] Remove gfs2_inode_attr_in · 9e2dbdac
      Steven Whitehouse 提交于
      This function wasn't really doing the right thing. There was no need
      to update the inode size at this point and the updating of the
      i_blocks field has now been moved to the places where di_blocks is
      updated. A result of this patch and some those preceeding it is that
      unlocking a glock is now a much more efficient process, since there
      is no longer any requirement to copy data from the gfs2 inode into
      the vfs inode at this point.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9e2dbdac
    • S
      [GFS2] Shrink gfs2_inode (8) - i_vn · bfded27b
      Steven Whitehouse 提交于
      This shrinks the size of the gfs2_inode by 8 bytes by
      replacing the version counter with a one bit valid/invalid
      flag.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bfded27b
    • S
      [GFS2] Shrink gfs2_inode (3) - di_mode · b60623c2
      Steven Whitehouse 提交于
      This removes the duplicate di_mode field in favour of using the
      inode->i_mode field. This saves 4 bytes.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b60623c2
    • A
      [GFS2] split and annotate gfs2_log_head · 55167622
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      55167622
  14. 03 10月, 2006 1 次提交
  15. 22 9月, 2006 1 次提交
    • S
      [GFS2] Tidy up meta_io code · 7276b3b0
      Steven Whitehouse 提交于
      Fix a bug in the directory reading code, where we might have dereferenced
      a NULL pointer in case of OOM. Updated the directory code to use the new
      & improved version of gfs2_meta_ra() which now returns the first block
      that was being read. Previously it was releasing it requiring following
      code to grab the block again at each point it was called.
      
      Also turned off readahead on directory lookups since we are reading a
      hash table, and therefore reading the entries in order is very
      unlikely. Readahead is still used for all other calls to the
      directory reading function (e.g. when growing the hash table).
      
      Removed the DIO_START constant. Everywhere this was used, it was
      used to unconditionally start i/o aside from a couple of places, so
      I've removed it and made the couple of exceptions to this rule into
      separate functions.
      
      Also hunted through the other DIO flags and removed them as arguments
      from functions which were always called with the same combination of
      arguments.
      
      Updated gfs2_meta_indirect_buffer to be a bit more efficient and
      hopefully also be a bit easier to read.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7276b3b0
  16. 19 9月, 2006 1 次提交
  17. 10 9月, 2006 1 次提交
  18. 05 9月, 2006 1 次提交
  19. 04 9月, 2006 1 次提交
  20. 01 9月, 2006 1 次提交
    • S
      [GFS2] Update copyright, tidy up incore.h · e9fc2aa0
      Steven Whitehouse 提交于
      As per comments from Jan Engelhardt <jengelh@linux01.gwdg.de> this
      updates the copyright message to say "version" in full rather than
      "v.2". Also incore.h has been updated to remove forward structure
      declarations which are not required.
      
      The gfs2_quota_lvb structure has now had endianess annotations added
      to it. Also quota.c has been updated so that we now store the
      lvb data locally in endian independant format to avoid needing
      a structure in host endianess too. As a result the endianess
      conversions are done as required at various points and thus the
      conversion routines in lvb.[ch] are no longer required. I've
      moved the one remaining constant in lvb.h thats used into lm.h
      and removed the unused lvb.[ch].
      
      I have not changed the HIF_ constants. That is left to a later patch
      which I hope will unify the gh_flags and gh_iflags fields of the
      struct gfs2_holder.
      
      Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e9fc2aa0
  21. 30 8月, 2006 2 次提交
  22. 28 7月, 2006 1 次提交
    • S
      [GFS2] Use a bio to read the superblock · f45b7ddd
      Steven Whitehouse 提交于
      This means that we don't need to create a special inode just to contain
      a struct address_space in order to read a single disk block. Instead
      we read the disk block directly. Its slightly faster, and uses slightly
      less memory, but the real reason for doing this is that it removes a
      special case from the glock code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f45b7ddd