1. 24 4月, 2012 1 次提交
    • S
      GFS2: Clean up log write code path · e8c92ed7
      Steven Whitehouse 提交于
      Prior to this patch, we have two ways of sending i/o to the log.
      One of those is used when we need to allocate both the data
      to be written itself and also a buffer head to submit it. This
      is done via sb_getblk and friends. This is used mostly for writing
      log headers.
      
      The other method is used when writing blocks which have some
      in-place counterpart. This is the case for all the metadata
      blocks which are journalled, and when journaled data is in use,
      for unescaped journalled data blocks.
      
      This patch replaces both of those two methods, and about half
      a dozen separate i/o submission points with a single i/o
      submission function. We also go direct to bio rather than
      using buffer heads, since this allows us to build i/o
      requests of the maximum size for the block device in
      question. It also reduces the memory required for flushing
      the log, which can be very useful in low memory situations.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e8c92ed7
  2. 05 3月, 2012 1 次提交
    • B
      GFS2: Eliminate sd_rindex_mutex · 6aad1c3d
      Bob Peterson 提交于
      Over time, we've slowly eliminated the use of sd_rindex_mutex.
      Up to this point, it was only used in two places: function
      gfs2_ri_total (which totals the file system size by reading
      and parsing the rindex file) and function gfs2_rindex_update
      which updates the rgrps in memory. Both of these functions have
      the rindex glock to protect them, so the rindex is unnecessary.
      Since gfs2_grow writes to the rindex via the meta_fs, the mutex
      is in the wrong order according to the normal rules. This patch
      eliminates the mutex entirely to avoid the problem.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6aad1c3d
  3. 29 2月, 2012 1 次提交
    • S
      GFS2: glock statistics gathering · a245769f
      Steven Whitehouse 提交于
      The stats are divided into two sets: those relating to the
      super block and those relating to an individual glock. The
      super block stats are done on a per cpu basis in order to
      try and reduce the overhead of gathering them. They are also
      further divided by glock type.
      
      In the case of both the super block and glock statistics,
      the same information is gathered in each case. The super
      block statistics are used to provide default values for
      most of the glock statistics, so that newly created glocks
      should have, as far as possible, a sensible starting point.
      
      The statistics are divided into three pairs of mean and
      variance, plus two counters. The mean/variance pairs are
      smoothed exponential estimates and the algorithm used is
      one which will be very familiar to those used to calculation
      of round trip times in network code.
      
      The three pairs of mean/variance measure the following
      things:
      
       1. DLM lock time (non-blocking requests)
       2. DLM lock time (blocking requests)
       3. Inter-request time (again to the DLM)
      
      A non-blocking request is one which will complete right
      away, whatever the state of the DLM lock in question. That
      currently means any requests when (a) the current state of
      the lock is exclusive (b) the requested state is either null
      or unlocked or (c) the "try lock" flag is set. A blocking
      request covers all the other lock requests.
      
      There are two counters. The first is there primarily to show
      how many lock requests have been made, and thus how much data
      has gone into the mean/variance calculations. The other counter
      is counting queueing of holders at the top layer of the glock
      code. Hopefully that number will be a lot larger than the number
      of dlm lock requests issued.
      
      So why gather these statistics? There are several reasons
      we'd like to get a better idea of these timings:
      
      1. To be able to better set the glock "min hold time"
      2. To spot performance issues more easily
      3. To improve the algorithm for selecting resource groups for
      allocation (to base it on lock wait time, rather than blindly
      using a "try lock")
      Due to the smoothing action of the updates, a step change in
      some input quantity being sampled will only fully be taken
      into account after 8 samples (or 4 for the variance) and this
      needs to be carefully considered when interpreting the
      results.
      
      Knowing both the time it takes a lock request to complete and
      the average time between lock requests for a glock means we
      can compute the total percentage of the time for which the
      node is able to use a glock vs. time that the rest of the
      cluster has its share. That will be very useful when setting
      the lock min hold time.
      
      The other point to remember is that all times are in
      nanoseconds. Great care has been taken to ensure that we
      measure exactly the quantities that we want, as accurately
      as possible. There are always inaccuracies in any
      measuring system, but I hope this is as accurate as we
      can reasonably make it.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a245769f
  4. 11 1月, 2012 3 次提交
  5. 22 11月, 2011 1 次提交
  6. 18 11月, 2011 1 次提交
  7. 21 10月, 2011 6 次提交
    • S
      GFS2: Remove two unused variables · 9ae32429
      Steven Whitehouse 提交于
      The two variables being initialised in gfs2_inplace_reserve
      to track the file & line number of the caller are never
      used, so we might as well remove them.
      
      If something does go wrong, then a stack trace is probably
      more useful anyway.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9ae32429
    • B
      GFS2: rewrite fallocate code to write blocks directly · 64dd153c
      Benjamin Marzinski 提交于
      GFS2's fallocate code currently goes through the page cache. Since it's only
      writing to the end of the file or to holes in it, it doesn't need to, and it
      was causing issues on low memory environments. This patch pulls in some of
      Steve's block allocation work, and uses it to simply allocate the blocks for
      the file, and zero them out at allocation time.  It provides a slight
      performance increase, and it dramatically simplifies the code.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      64dd153c
    • S
      GFS2: Cache the most recently used resource group in the inode · 54335b1f
      Steven Whitehouse 提交于
      This means that after the initial allocation for any inode, the
      last used resource group is cached in the inode for future use.
      This drastically reduces the number of lookups of resource
      groups in the common case, and this the contention on that
      data structure.
      
      The allocation algorithm is the same as previously, except that we
      always check to see if the goal block is within the cached rgrp
      first before going to the rbtree to look one up.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      54335b1f
    • S
      GFS2: Make resource groups "append only" during life of fs · 8339ee54
      Steven Whitehouse 提交于
      Since we have ruled out supporting online filesystem shrink,
      it is possible to make the resource group list append only
      during the life of a super block. This gives several benefits:
      
      Firstly, we only need to read new rindex elements as they are added
      rather than needing to reread the whole rindex file each time one
      element is added.
      
      Secondly, the rindex glock can be held for much shorter periods of
      time, and is completely removed from the fast path for allocations.
      The lock is taken in shared mode only when updating the resource
      groups when the first allocation occurs, and after a grow has
      taken place.
      
      Thirdly, this results in a reduction in code size, and everything
      gets a lot simpler to understand in this area.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8339ee54
    • B
      GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme · 7c9ca621
      Bob Peterson 提交于
      Here is an update of Bob's original rbtree patch which, in addition, also
      resolves the rather strange ref counting that was being done relating to
      the bitmap blocks.
      
      Originally we had a dual system for journaling resource groups. The metadata
      blocks were journaled and also the rgrp itself was added to a list. The reason
      for adding the rgrp to the list in the journal was so that the "repolish
      clones" code could be run to update the free space, and potentially send any
      discard requests when the log was flushed. This was done by comparing the
      "cloned" bitmap with what had been written back on disk during the transaction
      commit.
      
      Due to this, there was a requirement to hang on to the rgrps' bitmap buffers
      until the journal had been flushed. For that reason, there was a rather
      complicated set up in the ->go_lock ->go_unlock functions for rgrps involving
      both a mutex and a spinlock (the ->sd_rindex_spin) to maintain a reference
      count on the buffers.
      
      However, the journal maintains a reference count on the buffers anyway, since
      they are being journaled as metadata buffers. So by moving the code which deals
      with the post-journal accounting for bitmap blocks to the metadata journaling
      code, we can entirely dispense with the rather strange buffer ref counting
      scheme and also the requirement to journal the rgrps.
      
      The net result of all this is that the ->sd_rindex_spin is left to do exactly
      one job, and that is to look after the rbtree or rgrps.
      
      This patch is designed to be a stepping stone towards using RCU for the rbtree
      of resource groups, however the reduction in the number of uses of the
      ->sd_rindex_spin is likely to have benefits for multi-threaded workloads,
      anyway.
      
      The patch retains ->go_lock and ->go_unlock for rgrps, however these maybe also
      be removed in future in favour of calling the functions directly where required
      in the code. That will allow locking of resource groups without needing to
      actually read them in - something that could be useful in speeding up statfs.
      
      In the mean time though it is valid to dereference ->bi_bh only when the rgrp
      is locked. This is basically the same rule as before, modulo the references not
      being valid until the following journal flush.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Cc: Benjamin Marzinski <bmarzins@redhat.com>
      7c9ca621
    • S
      GFS2: Fix inode allocation error path · 40ac218f
      Steven Whitehouse 提交于
      If we have got far enough through the inode allocation code
      path that an inode has already been allocated, then we must
      call iput to dispose of it, if an error occurs during a
      later part of the process. This will always be the final iput
      since there will be no other references to the inode.
      
      Unlike when the inode has been unlinked, its block state will
      be GFS2_BLKST_INODE rather than GFS2_BLKST_UNLINKED so we need
      to skip the test in ->evict_inode() for this one case in order
      to ensure that it will be deallocated correctly. This patch adds
      a new flag in order to ensure that this will happen correctly.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      40ac218f
  8. 15 7月, 2011 2 次提交
    • B
      GFS2: Automatically adjust glock min hold time · 7cf8dcd3
      Bob Peterson 提交于
      This patch is a performance improvement for GFS2 in a clustered
      environment. It makes the glock hold time self-adjusting.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7cf8dcd3
    • S
      GFS2: Cache dir hash table in a contiguous buffer · 17d539f0
      Steven Whitehouse 提交于
      This patch adds a cache for the hash table to the directory code
      in order to help simplify the way in which the hash table is
      accessed. This is intended to be a first step towards introducing
      some performance improvements in the directory code.
      
      There are two follow ups that I'm hoping to see fairly shortly. One
      is to simplify the hash table reading code now that we always read the
      complete hash table, whether we want one entry or all of them. The
      other is to introduce readahead on the heads of the hash chains
      which are referred to from the table.
      
      The hash table is a maximum of 128k in size, so it is not worth trying
      to read it in small chunks.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      17d539f0
  9. 12 7月, 2011 1 次提交
    • S
      GFS2: Fix race during filesystem mount · 3942ae53
      Steven Whitehouse 提交于
      There is a potential race during filesystem mounting which has recently
      been reported. It occurs when the userland gfs_controld is able to
      process requests fast enough that it tries to use the sysfs interface
      before the lock module is properly initialised. This is a pretty
      unusual case as normally the lock module initialisation is very quick
      compared with gfs_controld.
      
      This patch adds an interruptible completion which is used to ensure that
      userland will wait for the initialisation of the lock module to
      complete.
      
      There are other potential solutions to this problem, but this is the
      quickest at this stage and has been tested both with and without
      mount.gfs2 present in the system.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Reported-by: NDavid Booher <dbooher@adams.net>
      3942ae53
  10. 10 5月, 2011 1 次提交
  11. 20 4月, 2011 3 次提交
    • S
      GFS2: Make writeback more responsive to system conditions · 4667a0ec
      Steven Whitehouse 提交于
      This patch adds writeback_control to writing back the AIL
      list. This means that we can then take advantage of the
      information we get in ->write_inode() in order to set off
      some pre-emptive writeback.
      
      In addition, the AIL code is cleaned up a bit to make it
      a bit simpler to understand.
      
      There is still more which can usefully be done in this area,
      but this is a good start at least.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4667a0ec
    • S
      GFS2: Optimise glock lru and end of life inodes · f42ab085
      Steven Whitehouse 提交于
      The GLF_LRU flag introduced in the previous patch can be
      used to check if a glock is on the lru list when a new
      holder is queued and if so remove it, without having first
      to get the lru_lock.
      
      The main purpose of this patch however is to optimise the
      glocks left over when an inode at end of life is being
      evicted. Previously such glocks were left with the GLF_LFLUSH
      flag set, so that when reclaimed, each one required a log flush.
      This patch resets the GLF_LFLUSH flag when there is nothing
      left to flush thus preventing later log flushes as glocks are
      reused or demoted.
      
      In order to do this, we need to keep track of the number of
      revokes which are outstanding, and also to clear the GLF_LFLUSH
      bit after a log commit when only revokes have been processed.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f42ab085
    • S
      GFS2: Improve tracing support (adds two flags) · 627c10b7
      Steven Whitehouse 提交于
      This adds support for two new flags. One keeps track of whether
      the glock is on the LRU list or not. The other isn't really a
      flag as such, but an indication of whether the glock has an
      attached object or not. This indication is reported without
      any locking, which is ok since we do not dereference the object
      pointer but merely report whether it is NULL or not.
      
      Also, this fixes one place where a tracepoint was missing, which
      was at the point we remove deallocated blocks from the journal.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      627c10b7
  12. 11 3月, 2011 1 次提交
    • D
      GFS2: introduce AIL lock · d6a079e8
      Dave Chinner 提交于
      The log lock is currently used to protect the AIL lists and
      the movements of buffers into and out of them. The lists
      are self contained and no log specific items outside the
      lists are accessed when starting or emptying the AIL lists.
      
      Hence the operation of the AIL does not require the protection
      of the log lock so split them out into a new AIL specific lock
      to reduce the amount of traffic on the log lock. This will
      also reduce the amount of serialisation that occurs when
      the gfs2_logd pushes on the AIL to move it forward.
      
      This reduces the impact of log pushing on sequential write
      throughput.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d6a079e8
  13. 09 3月, 2011 1 次提交
    • A
      GFS2: quota allows exceeding hard limit · 662e3a55
      Abhijith Das 提交于
      Immediately after being synced to disk, cached quotas are zeroed out and a
      subsequent access of the cached quotas results in incorrect zero values. This
      meant that gfs2 assumed the actual usage to be the zero (or near-zero) usage
      values it found in the cached quotas and comparison against warn/limits never
      triggered a quota violation.
      
      This patch adds a new flag QDF_REFRESH that is set after a sync so that the
      cached quotas are forcefully refreshed from disk on a subsequent access on
      seeing this flag set.
      
      Resolves: rhbz#675944
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      662e3a55
  14. 21 1月, 2011 1 次提交
    • S
      GFS2: Use RCU for glock hash table · bc015cb8
      Steven Whitehouse 提交于
      This has a number of advantages:
      
       - Reduces contention on the hash table lock
       - Makes the code smaller and simpler
       - Should speed up glock dumps when under load
       - Removes ref count changing in examine_bucket
       - No longer need hash chain lock in glock_put() in common case
      
      There are some further changes which this enables and which
      we may do in the future. One is to look at using SLAB_RCU,
      and another is to look at using a per-cpu counter for the
      per-sb glock counter, since that is touched twice in the
      lifetime of each glock (but only used at umount time).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      bc015cb8
  15. 11 1月, 2011 1 次提交
  16. 30 11月, 2010 1 次提交
    • S
      GFS2: Merge glock state fields into a bitfield · 47a25380
      Steven Whitehouse 提交于
      We can only merge the fields into a bitfield if the locking
      rules for them are the same. In this case gl_spin covers all
      of the fields (write side) but a couple of them are used
      with GLF_LOCK as the read side lock, which should be ok
      since we know that the field in question won't be changing
      at the time.
      
      The gl_req setting has to be done earlier (in glock.c) in order
      to place it under gl_spin. The gl_reply setting also has to be
      brought under gl_spin in order to comply with the new rules.
      
      This saves 4*sizeof(unsigned int) per glock.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      47a25380
  17. 29 9月, 2010 1 次提交
    • S
      GFS2: Improve journal allocation via sysfs · feb47ca9
      Steven Whitehouse 提交于
      Recently a feature was added to GFS2 to allow journal id allocation
      via sysfs. This patch builds upon that so that a negative journal id
      will be treated as an error code to be passed back as the return code
      from mount. This allows termination of the mount process if there is
      a failure.
      
      Also, the process has been updated so that the kernel will wait
      for a journal id, even in the "spectator" case. This is required
      in order to avoid mounting a filesystem in case there is an error
      while joining the cluster. In the spectator case, 0 is written into
      the file to indicate that all is well, and that mount should continue.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      feb47ca9
  18. 24 9月, 2010 1 次提交
  19. 23 9月, 2010 2 次提交
    • S
      GFS2: Remove localcaching mount option · c2048b00
      Steven Whitehouse 提交于
      This option defaulted to on for lock_nolock mounts and off
      otherwise. The only function was to avoid the revalidation of
      dentries. In the cluster case, that is entirely pointless and
      liable to cause coherency problems.
      
      The patch changes the revalidation to depend upon whether the
      fs is a local or cluster fs (i.e. it follows the existing default
      behaviour). I very much doubt anybody ever used this option as
      there is no reason to. Even so we will continue to accept it
      on the mount command line, but ignore it.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c2048b00
    • S
      GFS2: Remove ignore_local_fs mount argument · f57a024e
      Steven Whitehouse 提交于
      This is been a no-op for a very long time now. I'm pretty sure
      nobody uses it, but just in case we'll still accept it on the
      command line, but ignore it.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f57a024e
  20. 20 9月, 2010 3 次提交
    • S
      GFS2: Don't enforce min hold time when two demotes occur in rapid succession · 7b5e3d5f
      Steven Whitehouse 提交于
      Due to the design of the VFS, it is quite usual for operations on GFS2
      to consist of a lookup (requiring a shared lock) followed by an
      operation requiring an exclusive lock. If a remote node has cached an
      exclusive lock, then it will receive two demote events in rapid succession
      firstly for a shared lock and then to unlocked. The existing min hold time
      code was triggering in this case, even if the node was otherwise idle
      since the state change time was being updated by the initial demote.
      
      This patch introduces logic to skip the min hold timer in the case that
      a "double demote" of this kind has occurred. The min hold timer will
      still be used in all other cases.
      
      A new glock flag is introduced which is used to keep track of whether
      there have been any newly queued holders since the last glock state
      change. The min hold time is only applied if the flag is set.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Tested-by: NAbhijith Das <adas@redhat.com>
      7b5e3d5f
    • B
      GFS2: fallocate support · 3921120e
      Benjamin Marzinski 提交于
      This patch adds support for fallocate to gfs2.  Since the gfs2 does not support
      uninitialized data blocks, it must write out zeros to all the blocks.  However,
      since it does not need to lock any pages to read from, gfs2 can write out the
      zero blocks much more efficiently.  On a moderately full filesystem, fallocate
      works around 5 times faster on average.  The fallocate call also allows gfs2 to
      add blocks to the file without changing the filesize, which will make it
      possible for gfs2 to preallocate space for the rindex file, so that gfs2 can
      grow a completely full filesystem.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3921120e
    • S
      GFS2: Remove i_disksize · a2e0f799
      Steven Whitehouse 提交于
      With the update of the truncate code, ip->i_disksize and
      inode->i_size are merely copies of each other. This means
      we can remove ip->i_disksize and use inode->i_size exclusively
      reducing the size of a GFS2 inode by 8 bytes.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a2e0f799
  21. 29 7月, 2010 1 次提交
    • S
      GFS2: Wait for journal id on mount if not specified on mount command line · ba6e9364
      Steven Whitehouse 提交于
      This patch implements a wait for the journal id in the case that it has
      not been specified on the command line. This is to allow the future
      removal of the mount.gfs2 helper. The journal id would instead be
      directly communicated by gfs_controld to the file system. Here is a
      comparison of the two systems:
      
      Current:
      1. mount calls mount.gfs2
      2. mount.gfs2 connects to gfs_controld to retrieve the journal id
      3. mount.gfs2 adds the journal id to the mount command line and calls
      the mount system call
      4. gfs_controld receives the status of the mount request via a uevent
      
      Proposed:
      1. mount calls the mount system call (no mount.gfs2 helper)
      2. gfs_controld receives a uevent for a gfs2 fs which it doesn't know
      about already
      3. gfs_controld assigns a journal id to it via sysfs
      4. the mount system call then completes as normal (sending a uevent
      according to status)
      
      The advantage of the proposed system is that it is completely backward
      compatible with the current system both at the kernel and at the
      userland levels. The "first" parameter can also be set the same way,
      with the restriction that it must be set before the journal id is
      assigned.
      
      In addition, if mount becomes stuck waiting for a reply from
      gfs_controld which never arrives, then it is killable and will abort the
      mount gracefully.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ba6e9364
  22. 23 7月, 2010 1 次提交
    • T
      gfs2: use workqueue instead of slow-work · 6ecd7c2d
      Tejun Heo 提交于
      Workqueue can now handle high concurrency.  Convert gfs to use
      workqueue instead of slow-work.
      
      * Steven pointed out that recovery path might be run from allocation
        path and thus requires forward progress guarantee without memory
        allocation.  Create and use gfs_recovery_wq with rescuer.  Please
        note that forward progress wasn't guaranteed with slow-work.
      
      * Updated to use non-reentrant workqueue.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      6ecd7c2d
  23. 06 5月, 2010 1 次提交
    • S
      GFS2: Add some useful messages · 913a71d2
      Steven Whitehouse 提交于
      The following patch adds a message to indicate when barriers have been
      disabled due to a block device which doesn't support them. You could
      already tell this via the mount options in /proc/mounts, but all the
      other filesystems also log a message at the same time.
      
      Also, the same mechanisms are used to indicate when the lock
      demote interface has been used (only ever used for debugging)
      which is a request from our support team.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      913a71d2
  24. 05 5月, 2010 1 次提交
    • B
      GFS2: Various gfs2_logd improvements · 5e687eac
      Benjamin Marzinski 提交于
      This patch contains various tweaks to how log flushes and active item writeback
      work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits
      for gfs2_logd to do the log flushing.  Multiple functions were rewritten to
      remove the need to call gfs2_log_lock(). Instead of using one test to see if
      gfs2_logd had work to do, there are now seperate tests to check if there
      are two many buffers in the incore log or if there are two many items on the
      active items list.
      
      This patch is a port of a patch Steve Whitehouse wrote about a year ago, with
      some minor changes.  Since gfs2_ail1_start always submits all the active items,
      it no longer needs to keep track of the first ai submitted, so this has been
      removed. In gfs2_log_reserve(), the order of the calls to
      prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has
      been switched.  If it called wake_up first there was a small window for a race,
      where logd could run and return before gfs2_log_reserve was ready to get woken
      up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve()
      would be left waiting for gfs2_logd to eventualy run because it timed out.
      Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times
      out, and flushes the log, can now be set on mount with ar_commit.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5e687eac
  25. 11 3月, 2010 1 次提交
    • B
      GFS2: Allow the number of committed revokes to temporarily be negative · 2e95e3f6
      Benjamin Marzinski 提交于
      GFS2 tracks the number of revokes and unrevokes that are part of committed
      transactions via sd_log_commited_revoke. It is possible for one process to add
      revokes during its transaction, while another process unrevokes them during its
      transaction. If the second process finishes its transaction first,
      sd_log_commited_revoke will be decremented by the number of unrevokes that the
      second process did, without first being incremented by the number of revokes
      the first process did. This is fine, since all started transactions must be
      completed before the journal can be flushed.  However, sd_log_commited_revoke
      is an unsigned integer, and log_refund() causes an assertion failure if it
      would go negative at the end of a transaction.  This patch makes
      sd_log_commited_revoke a signed integer and allows it to go negative.
      __gfs2_log_flush() still checks that it mataches the actual number of revokes.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2e95e3f6
  26. 01 3月, 2010 2 次提交
    • S
      GFS2: Remove loopy umount code · c1184f8a
      Steven Whitehouse 提交于
      As a consequence of the previous patch, we can now remove the
      loop which used to be required due to the circular dependency
      between the inodes and glocks. Instead we can just invalidate
      the inodes, and then clear up any glocks which are left.
      
      Also we no longer need the rwsem since there is no longer any
      danger of the inode invalidation calling back into the glock
      code (and from there back into the inode code).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c1184f8a
    • S
      GFS2: Metadata address space clean up · 009d8518
      Steven Whitehouse 提交于
      Since the start of GFS2, an "extra" inode has been used to store
      the metadata belonging to each inode. The only reason for using
      this inode was to have an extra address space, the other fields
      were unused. This means that the memory usage was rather inefficient.
      
      The reason for keeping each inode's metadata in a separate address
      space is that when glocks are requested on remote nodes, we need to
      be able to efficiently locate the data and metadata which relating
      to that glock (inode) in order to sync or sync and invalidate it
      (depending on the remotely requested lock mode).
      
      This patch adds a new type of glock, which has in addition to
      its normal fields, has an address space. This applies to all
      inode and rgrp glocks (but to no other glock types which remain
      as before). As a result, we no longer need to have the second
      inode.
      
      This results in three major improvements:
       1. A saving of approx 25% of memory used in caching inodes
       2. A removal of the circular dependency between inodes and glocks
       3. No confusion between "normal" and "metadata" inodes in super.c
      
      Although the first of these is the more immediately apparent, the
      second is just as important as it now enables a number of clean
      ups at umount time. Those will be the subject of future patches.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      009d8518