1. 05 3月, 2012 1 次提交
    • B
      GFS2: Eliminate sd_rindex_mutex · 6aad1c3d
      Bob Peterson 提交于
      Over time, we've slowly eliminated the use of sd_rindex_mutex.
      Up to this point, it was only used in two places: function
      gfs2_ri_total (which totals the file system size by reading
      and parsing the rindex file) and function gfs2_rindex_update
      which updates the rgrps in memory. Both of these functions have
      the rindex glock to protect them, so the rindex is unnecessary.
      Since gfs2_grow writes to the rindex via the meta_fs, the mutex
      is in the wrong order according to the normal rules. This patch
      eliminates the mutex entirely to avoid the problem.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6aad1c3d
  2. 01 3月, 2012 1 次提交
  3. 29 2月, 2012 5 次提交
    • S
      GFS2: Make bd_cmp() static · 08728f2d
      Steven Whitehouse 提交于
      Add missing static to bd_cmp()
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      08728f2d
    • B
      GFS2: Sort the ordered write list · 4a36d08d
      Bob Peterson 提交于
      This patch sorts the ordered write list for GFS2 writes.
      This increases the throughput for simultaneous writes.
      For example, if you have ten processes, all doing:
      dd if=/dev/zero of=/mnt/gfs2/fileX
      on different files, the throughput will be much better.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4a36d08d
    • S
      GFS2: FITRIM ioctl support · 66fc061b
      Steven Whitehouse 提交于
      The FITRIM ioctl provides an alternative way to send discard requests to
      the underlying device. Using the discard mount option results in every
      freed block generating a discard request to the block device. This can
      be slow, since many block devices can only process discard requests of
      larger sizes, and also such operations can be time consuming.
      
      Rather than using the discard mount option, FITRIM allows a sweep of the
      filesystem on an occasional basis, and also to optionally avoid sending
      down discard requests for smaller regions.
      
      In GFS2 FITRIM will work at resource group granularity. There is a flag
      for each resource group which keeps track of which resource groups have
      been trimmed. This flag is reset whenever a deallocation occurs in the
      resource group, and set whenever a successful FITRIM of that resource
      group has taken place. This helps to reduce repeated discard requests
      for the same block ranges, again improving performance.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      66fc061b
    • S
      GFS2: Move two functions from log.c to lops.c · 47ac5537
      Steven Whitehouse 提交于
      gfs2_log_get_buf() and gfs2_log_fake_buf() are both used
      only in lops.c, so move them next to their callers and they
      can then become static.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      47ac5537
    • S
      GFS2: glock statistics gathering · a245769f
      Steven Whitehouse 提交于
      The stats are divided into two sets: those relating to the
      super block and those relating to an individual glock. The
      super block stats are done on a per cpu basis in order to
      try and reduce the overhead of gathering them. They are also
      further divided by glock type.
      
      In the case of both the super block and glock statistics,
      the same information is gathered in each case. The super
      block statistics are used to provide default values for
      most of the glock statistics, so that newly created glocks
      should have, as far as possible, a sensible starting point.
      
      The statistics are divided into three pairs of mean and
      variance, plus two counters. The mean/variance pairs are
      smoothed exponential estimates and the algorithm used is
      one which will be very familiar to those used to calculation
      of round trip times in network code.
      
      The three pairs of mean/variance measure the following
      things:
      
       1. DLM lock time (non-blocking requests)
       2. DLM lock time (blocking requests)
       3. Inter-request time (again to the DLM)
      
      A non-blocking request is one which will complete right
      away, whatever the state of the DLM lock in question. That
      currently means any requests when (a) the current state of
      the lock is exclusive (b) the requested state is either null
      or unlocked or (c) the "try lock" flag is set. A blocking
      request covers all the other lock requests.
      
      There are two counters. The first is there primarily to show
      how many lock requests have been made, and thus how much data
      has gone into the mean/variance calculations. The other counter
      is counting queueing of holders at the top layer of the glock
      code. Hopefully that number will be a lot larger than the number
      of dlm lock requests issued.
      
      So why gather these statistics? There are several reasons
      we'd like to get a better idea of these timings:
      
      1. To be able to better set the glock "min hold time"
      2. To spot performance issues more easily
      3. To improve the algorithm for selecting resource groups for
      allocation (to base it on lock wait time, rather than blindly
      using a "try lock")
      Due to the smoothing action of the updates, a step change in
      some input quantity being sampled will only fully be taken
      into account after 8 samples (or 4 for the variance) and this
      needs to be carefully considered when interpreting the
      results.
      
      Knowing both the time it takes a lock request to complete and
      the average time between lock requests for a glock means we
      can compute the total percentage of the time for which the
      node is able to use a glock vs. time that the rest of the
      cluster has its share. That will be very useful when setting
      the lock min hold time.
      
      The other point to remember is that all times are in
      nanoseconds. Great care has been taken to ensure that we
      measure exactly the quantities that we want, as accurately
      as possible. There are always inaccuracies in any
      measuring system, but I hope this is as accurate as we
      can reasonably make it.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a245769f
  4. 28 2月, 2012 4 次提交
  5. 11 1月, 2012 5 次提交
  6. 07 1月, 2012 1 次提交
  7. 04 1月, 2012 8 次提交
  8. 06 12月, 2011 1 次提交
  9. 23 11月, 2011 1 次提交
  10. 22 11月, 2011 4 次提交
    • S
      GFS2: Fix multi-block allocation · 6a8099ed
      Steven Whitehouse 提交于
      Clean up gfs2_alloc_blocks so that it takes the full extent length
      rather than just the number of non-inode blocks as an argument. That
      will only make a difference in the inode allocation case for now.
      
      Also, this fixes the extent length handling around gfs2_alloc_extent() so
      that multi block allocations will work again.
      
      The rd_last_alloc block is set to the final block in the allocated
      extent (as per the update to i_goal, but referenced to a different
      start point).
      
      This also removes the dinode argument to rgblk_search() which is no
      longer used.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6a8099ed
    • B
      GFS2: decouple quota allocations from block allocations · 564e12b1
      Bob Peterson 提交于
      This patch separates the code pertaining to allocations into two
      parts: quota-related information and block reservations.
      This patch also moves all the block reservation structure allocations to
      function gfs2_inplace_reserve to simplify the code, and moves
      the frees to function gfs2_inplace_release.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      564e12b1
    • T
      freezer: unexport refrigerator() and update try_to_freeze() slightly · a0acae0e
      Tejun Heo 提交于
      There is no reason to export two functions for entering the
      refrigerator.  Calling refrigerator() instead of try_to_freeze()
      doesn't save anything noticeable or removes any race condition.
      
      * Rename refrigerator() to __refrigerator() and make it return bool
        indicating whether it scheduled out for freezing.
      
      * Update try_to_freeze() to return bool and relay the return value of
        __refrigerator() if freezing().
      
      * Convert all refrigerator() users to try_to_freeze().
      
      * Update documentation accordingly.
      
      * While at it, add might_sleep() to try_to_freeze().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Christoph Hellwig <hch@infradead.org>
      a0acae0e
    • B
      GFS2: split function rgblk_search · b3e47ca0
      Bob Peterson 提交于
      This patch splits function rgblk_search into a function that finds
      blocks to allocate (rgblk_search) and a function that assigns those
      blocks (gfs2_alloc_extent).
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@rehat.com>
      b3e47ca0
  11. 21 11月, 2011 3 次提交
    • S
      GFS2: Fix up "off by one" in the previous patch · 465f0a76
      Steven Whitehouse 提交于
      The trace point should take extlen and not *ndata as the
      extent length.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      465f0a76
    • B
      GFS2: move toward a generic multi-block allocator · 6e87ed0f
      Bob Peterson 提交于
      This patch is a revision of the one I previously posted.
      I tried to integrate all the suggestions Steve gave.
      The purpose of the patch is to change function gfs2_alloc_block
      (allocate either a dinode block or an extent of data blocks)
      to a more generic gfs2_alloc_blocks function that can
      allocate both a dinode _and_ an extent of data blocks in the
      same call. This will ultimately help us create a multi-block
      reservation scheme to reduce file fragmentation.
      
      This patch moves more toward a generic multi-block allocator that
      takes a pointer to the number of data blocks to allocate, plus whether
      or not to allocate a dinode. In theory, it could be called to allocate
      (1) a single dinode block, (2) a group of one or more data blocks, or
      (3) a dinode plus several data blocks.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6e87ed0f
    • S
      GFS2: O_(D)SYNC support for fallocate · 4442f2e0
      Steven Whitehouse 提交于
      Add sync of metadata after fallocate for O_SYNC files to ensure that we
      meet expectations for everything being on disk in this case.
      Unfortunately, the offset and len parameters are modified during the
      course of the fallocate function, so I've had to add a couple of new
      variables to call generic_write_sync() at the end.
      
      I know that potentially this will sync data as well within the range,
      but I think that is a fairly harmless side-effect overall, since we
      would not normally expect there to be any dirty data within the range in
      question.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Benjamin Marzinski <bmarzins@redhat.com>
      4442f2e0
  12. 18 11月, 2011 1 次提交
  13. 15 11月, 2011 2 次提交
  14. 09 11月, 2011 2 次提交
  15. 08 11月, 2011 1 次提交