1. 24 4月, 2012 13 次提交
  2. 10 4月, 2012 1 次提交
    • B
      GFS2: Allow caching of rindex glock · ca9248d8
      Bob Peterson 提交于
      This patch allows caching of the rindex glock. We were previously
      setting the GL_NOCACHE bit when the glock was released. That forced
      the rindex inode to be invalidated, which caused us to re-read
      rindex at the next access. However, it caused the glock to be
      unnecessarily bounced around the cluster. This patch allows
      the glock to remain cached, but it still causes the rindex to be
      re-read once it has been written to by gfs2_grow.
      
      Ben and I have tested single-node gfs2_grow cases and I've tested
      clustered gfs2_grow cases on my four-node cluster.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ca9248d8
  3. 05 4月, 2012 1 次提交
  4. 01 4月, 2012 1 次提交
  5. 26 3月, 2012 2 次提交
  6. 21 3月, 2012 1 次提交
  7. 20 3月, 2012 2 次提交
  8. 09 3月, 2012 2 次提交
    • B
      GFS2: call gfs2_write_alloc_required for each chunk · 58a7d5fb
      Benjamin Marzinski 提交于
      gfs2_fallocate was calling gfs2_write_alloc_required() once at the start of
      the function. This caused problems since gfs2_write_alloc_required used a
      long unsigned int for the len, but gfs2_fallocate could allocate a much
      larger amount.  This patch will move the call into the loop where the
      chunks are actually allocated and zeroed out. This will keep the allocation
      size under the limit, and also allow gfs2_fallocate to quickly skip over
      sections of the file that are already completely allocated.
      
      fallcate_chunk was also not correctly setting the file size.  It was using the
      len veriable to find the last block written to, but by the time it was setting
      the size, the len variable had already been decremented to 0.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      58a7d5fb
    • S
      GFS2: Clean up log flush header writing · 34cc1781
      Steven Whitehouse 提交于
      We already send both a pre and post flush to the block device
      when writing a journal header. There is no need to wait for
      the previous I/O specifically when we do this, unless we've
      turned "barriers" off.
      
      As a side effect, this also cleans up the code path for flushing
      the journal and makes it more readable.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      34cc1781
  9. 08 3月, 2012 1 次提交
    • S
      GFS2: Remove a __GFP_NOFAIL allocation · 75ca61c1
      Steven Whitehouse 提交于
      In order to ensure that we've got enough buffer heads for flushing
      the journal, the orignal code used __GFP_NOFAIL when performing
      this allocation. Here we dispense with that in favour of using a
      mempool. This should improve efficiency in low memory conditions
      since flushing the journal is a good way to get memory back, we
      don't want to be spinning, waiting on memory allocations. The
      buffers which are allocated via this mempool are fairly short lived,
      so that we'll recycle them pretty quickly.
      
      Although there are other memory allocations which occur during the
      journal flush process, this is the one which can potentially require
      the most memory, so the most important one to fix.
      
      The amount of memory reserved is a fixed amount, and we should not need
      to scale it when there are a greater number of filesystems in use.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      75ca61c1
  10. 07 3月, 2012 1 次提交
  11. 05 3月, 2012 2 次提交
    • B
      GFS2: make sure rgrps are up to date in func gfs2_blk2rgrpd · 58884c4d
      Bob Peterson 提交于
      This patch adds a call to gfs2_rindex_update from function gfs2_blk2rgrpd
      and removes calls to it that are made redundant by it. The problem is
      that a gfs2_grow can add rgrps to the rindex, then put those rgrps into
      use, thus rendering the rindex we read in at mount time incomplete.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      58884c4d
    • B
      GFS2: Eliminate sd_rindex_mutex · 6aad1c3d
      Bob Peterson 提交于
      Over time, we've slowly eliminated the use of sd_rindex_mutex.
      Up to this point, it was only used in two places: function
      gfs2_ri_total (which totals the file system size by reading
      and parsing the rindex file) and function gfs2_rindex_update
      which updates the rgrps in memory. Both of these functions have
      the rindex glock to protect them, so the rindex is unnecessary.
      Since gfs2_grow writes to the rindex via the meta_fs, the mutex
      is in the wrong order according to the normal rules. This patch
      eliminates the mutex entirely to avoid the problem.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6aad1c3d
  12. 01 3月, 2012 1 次提交
  13. 29 2月, 2012 5 次提交
    • S
      GFS2: Make bd_cmp() static · 08728f2d
      Steven Whitehouse 提交于
      Add missing static to bd_cmp()
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      08728f2d
    • B
      GFS2: Sort the ordered write list · 4a36d08d
      Bob Peterson 提交于
      This patch sorts the ordered write list for GFS2 writes.
      This increases the throughput for simultaneous writes.
      For example, if you have ten processes, all doing:
      dd if=/dev/zero of=/mnt/gfs2/fileX
      on different files, the throughput will be much better.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4a36d08d
    • S
      GFS2: FITRIM ioctl support · 66fc061b
      Steven Whitehouse 提交于
      The FITRIM ioctl provides an alternative way to send discard requests to
      the underlying device. Using the discard mount option results in every
      freed block generating a discard request to the block device. This can
      be slow, since many block devices can only process discard requests of
      larger sizes, and also such operations can be time consuming.
      
      Rather than using the discard mount option, FITRIM allows a sweep of the
      filesystem on an occasional basis, and also to optionally avoid sending
      down discard requests for smaller regions.
      
      In GFS2 FITRIM will work at resource group granularity. There is a flag
      for each resource group which keeps track of which resource groups have
      been trimmed. This flag is reset whenever a deallocation occurs in the
      resource group, and set whenever a successful FITRIM of that resource
      group has taken place. This helps to reduce repeated discard requests
      for the same block ranges, again improving performance.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      66fc061b
    • S
      GFS2: Move two functions from log.c to lops.c · 47ac5537
      Steven Whitehouse 提交于
      gfs2_log_get_buf() and gfs2_log_fake_buf() are both used
      only in lops.c, so move them next to their callers and they
      can then become static.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      47ac5537
    • S
      GFS2: glock statistics gathering · a245769f
      Steven Whitehouse 提交于
      The stats are divided into two sets: those relating to the
      super block and those relating to an individual glock. The
      super block stats are done on a per cpu basis in order to
      try and reduce the overhead of gathering them. They are also
      further divided by glock type.
      
      In the case of both the super block and glock statistics,
      the same information is gathered in each case. The super
      block statistics are used to provide default values for
      most of the glock statistics, so that newly created glocks
      should have, as far as possible, a sensible starting point.
      
      The statistics are divided into three pairs of mean and
      variance, plus two counters. The mean/variance pairs are
      smoothed exponential estimates and the algorithm used is
      one which will be very familiar to those used to calculation
      of round trip times in network code.
      
      The three pairs of mean/variance measure the following
      things:
      
       1. DLM lock time (non-blocking requests)
       2. DLM lock time (blocking requests)
       3. Inter-request time (again to the DLM)
      
      A non-blocking request is one which will complete right
      away, whatever the state of the DLM lock in question. That
      currently means any requests when (a) the current state of
      the lock is exclusive (b) the requested state is either null
      or unlocked or (c) the "try lock" flag is set. A blocking
      request covers all the other lock requests.
      
      There are two counters. The first is there primarily to show
      how many lock requests have been made, and thus how much data
      has gone into the mean/variance calculations. The other counter
      is counting queueing of holders at the top layer of the glock
      code. Hopefully that number will be a lot larger than the number
      of dlm lock requests issued.
      
      So why gather these statistics? There are several reasons
      we'd like to get a better idea of these timings:
      
      1. To be able to better set the glock "min hold time"
      2. To spot performance issues more easily
      3. To improve the algorithm for selecting resource groups for
      allocation (to base it on lock wait time, rather than blindly
      using a "try lock")
      Due to the smoothing action of the updates, a step change in
      some input quantity being sampled will only fully be taken
      into account after 8 samples (or 4 for the variance) and this
      needs to be carefully considered when interpreting the
      results.
      
      Knowing both the time it takes a lock request to complete and
      the average time between lock requests for a glock means we
      can compute the total percentage of the time for which the
      node is able to use a glock vs. time that the rest of the
      cluster has its share. That will be very useful when setting
      the lock min hold time.
      
      The other point to remember is that all times are in
      nanoseconds. Great care has been taken to ensure that we
      measure exactly the quantities that we want, as accurately
      as possible. There are always inaccuracies in any
      measuring system, but I hope this is as accurate as we
      can reasonably make it.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a245769f
  14. 28 2月, 2012 4 次提交
  15. 11 1月, 2012 3 次提交