1. 02 10月, 2013 1 次提交
    • S
      GFS2: Add allocation parameters structure · 7b9cff46
      Steven Whitehouse 提交于
      This patch adds a structure to contain allocation parameters with
      the intention of future expansion of this structure. The idea is
      that we should be able to add more information about the allocation
      in the future in order to allow the allocator to make a better job
      of placing the requests on-disk.
      
      There is no functional difference from applying this patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7b9cff46
  2. 27 9月, 2013 1 次提交
    • S
      GFS2: Clean up reservation removal · af5c2697
      Steven Whitehouse 提交于
      The reservation for an inode should be cleared when it is truncated so
      that we can start again at a different offset for future allocations.
      We could try and do better than that, by resetting the search based on
      where the truncation started from, but this is only a first step.
      
      In addition, there are three callers of gfs2_rs_delete() but only one
      of those should really be testing the value of i_writecount. While
      we get away with that in the other cases currently, I think it would
      be better if we made that test specific to the one case which
      requires it.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      af5c2697
  3. 18 9月, 2013 2 次提交
    • B
      GFS2: new function gfs2_rbm_incr · 149ed7f5
      Bob Peterson 提交于
      Since the previous patch eliminated bi in favor of bii, this follow-on
      patch needed to be adjusted accordingly. Here is the revised version.
      
      This patch adds a new function, gfs2_rbm_incr, which increments
      an rbm structure. This is more efficient than calling gfs2_rbm_to_block,
      incrementing, then calling gfs2_rbm_from_block.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      149ed7f5
    • B
      GFS2: Introduce rbm field bii · e579ed4f
      Bob Peterson 提交于
      This is a respin of the original patch. As Steve pointed out, the
      introduction of field bii makes it easy to eliminate bi itself.
      This revised patch does just that, replacing bi with bii.
      
      This patch adds a new field to the rbm structure, called bii,
      which is an index into the array of bitmaps for an rgrp.
      This replaces *bi which was a pointer to the bitmap.
      This is being done for further optimizations.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e579ed4f
  4. 17 9月, 2013 3 次提交
  5. 20 6月, 2013 1 次提交
    • A
      GFS2: Fix fstrim boundary conditions · 6a98c333
      Abhijith Das 提交于
      This patch correctly distinguishes two boundary conditions:
      
      1. When the given range is entire within the unaccounted space between
         two rgrps, and
      2. The range begins beyond the end of the filesystem
      
      Also fix the unit of the returned value r.len (total trimming) to be in bytes 
      instead of the (incorrect) 512 byte blocks
      
      With this patch, GFS2 passes multiple iterations of all the relevant xfstests
      (251, 260, 288) with different fs block sizes.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6a98c333
  6. 03 6月, 2013 1 次提交
  7. 24 5月, 2013 1 次提交
    • B
      GFS2: Use single-block reservations for directories · af21ca8e
      Bob Peterson 提交于
      This patch changes the multi-block allocation code, such that
      directory inodes only get a single block reserved in the bitmap.
      That way, the bitmaps are more tightly packed together, and there
      are fewer spans of free blocks for in-use block reservations.
      This means it takes less time to find a free span of blocks in the
      bitmap, which speeds things up. This increases the performance of
      some workloads by almost 2X. In Nate's mockup.py script (which does
      (1) create dir, (2) create dir in dir, (3) create file in that dir)
      the test executes in 23 steps rather than 43 steps, a 47%
      performance improvement.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      af21ca8e
  8. 08 4月, 2013 2 次提交
    • B
      GFS2: Remove vestigial parameter ip from function rs_deltree · 20095218
      Bob Peterson 提交于
      The functions that delete block reservations from the rgrp block
      reservations rbtree no longer use the ip parameter. This patch
      eliminates the parameter.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      20095218
    • S
      GFS2: Clean up inode creation path · fd4b4e04
      Steven Whitehouse 提交于
      This patch cleans up the inode creation code path in GFS2. After the
      Orlov allocator was merged, a number of potential improvements are
      now possible, and this is a first set of these.
      
      The quota handling is now updated so that it matches the point in
      the code where the allocation takes place. This means that the one
      exception in gfs2_alloc_blocks relating to quota is now no longer
      required, and we can use the generic code everywhere.
      
      In addition the call to figure out whether we need to allocate any
      extra blocks in order to add a directory entry is moved higher up
      gfs2_create_inode. This means that if it returns an error, we
      can deal with that at a stage where it is easier to handle that case.
      The returned status cannot change during the function since we hold
      an exclusive lock on the directory.
      
      Two calls to gfs2_rindex_update have been changed to one, again at
      the top of gfs2_create_inode to simplify error handling.
      
      The time stamps are also now initialised earlier in the creation
      process, this is gradually moving towards being able to remove the
      call to gfs2_refresh_inode in gfs2_inode_create once we have all the
      fields covered.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fd4b4e04
  9. 06 4月, 2013 1 次提交
    • B
      GFS2: Issue discards in 512b sectors · b2c87cae
      Bob Peterson 提交于
      This patch changes GFS2's discard issuing code so that it calls
      function sb_issue_discard rather than blkdev_issue_discard. The
      code was calling blkdev_issue_discard and specifying the correct
      sector offset and sector size, but blkdev_issue_discard expects
      these values to be in terms of 512 byte sectors, even if the native
      sector size for the device is different. Calling sb_issue_discard
      with the BLOCK size instead ensures the correct block-to-512b-sector
      translation. I verified that "minlen" is specified in blocks, so
      comparing it to a number of blocks is correct.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b2c87cae
  10. 04 4月, 2013 1 次提交
  11. 23 2月, 2013 1 次提交
  12. 29 1月, 2013 1 次提交
    • S
      GFS2: Split gfs2_trans_add_bh() into two · 350a9b0a
      Steven Whitehouse 提交于
      There is little common content in gfs2_trans_add_bh() between the data
      and meta classes by the time that the functions which it calls are
      taken into account. The intent here is to split this into two
      separate functions. Stage one is to introduce gfs2_trans_add_data()
      and gfs2_trans_add_meta() and update the callers accordingly.
      
      Later patches will then pull in the content of gfs2_trans_add_bh()
      and its dependent functions in order to clean up the code in this
      area.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      350a9b0a
  13. 02 1月, 2013 3 次提交
    • B
      GFS2: Reset rd_last_alloc when it reaches the end of the rgrp · 13d2eb01
      Bob Peterson 提交于
      In function rg_mblk_search, it's searching for multiple blocks in
      a given state (e.g. "free"). If there's an active block reservation
      its goal is the next free block of that. If the resource group
      contains the dinode's goal block, that's used for the search. But
      if neither is the case, it uses the rgrp's last allocated block.
      That way, consecutive allocations appear after one another on media.
      The problem comes in when you hit the end of the rgrp; it would never
      start over and search from the beginning. This became a problem,
      since if you deleted all the files and data from the rgrp, it would
      never start over and find free blocks. So it had to keep searching
      further out on the media to allocate blocks. This patch resets the
      rd_last_alloc after it does an unsuccessful search at the end of
      the rgrp.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      13d2eb01
    • B
      GFS2: Stop looking for free blocks at end of rgrp · 15bd50ad
      Bob Peterson 提交于
      This patch adds a return code check after calling function
      gfs2_rbm_from_block while determining the free extent size.
      That way, when the end of an rgrp is reached, it won't try
      to process unaligned blocks after the end.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      15bd50ad
    • A
      GFS2: Fix race in gfs2_rs_alloc · f1213cac
      Abhijith Das 提交于
      QE aio tests uncovered a race condition in gfs2_rs_alloc where it's possible
      to come out of the function with a valid ip->i_res allocation but it gets
      freed before use resulting in a NULL ptr dereference.
      
      This patch envelopes the initial short-circuit check for non-NULL ip->i_res
      into the mutex lock. With this patch, I was able to successfully run the
      reproducer test multiple times.
      
      Resolves: rhbz#878476
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f1213cac
  14. 15 11月, 2012 1 次提交
  15. 13 11月, 2012 1 次提交
    • S
      GFS2: Fix one RG corner case · aa8920c9
      Steven Whitehouse 提交于
      For filesystems with only a single resource group, we need to be careful
      that the allocation loop will not land up with a NULL resource group. This
      fixes a bug in a previous patch where the gfs2_rgrpd_get_next() function
      was being used instead of gfs2_rgrpd_get_first()
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      aa8920c9
  16. 07 11月, 2012 6 次提交
    • S
      GFS2: Add Orlov allocator · 9dbe9610
      Steven Whitehouse 提交于
      Just like ext3, this works on the root directory and any directory
      with the +T flag set. Also, just like ext3, any subdirectory created
      in one of the just mentioned cases will be allocated to a random
      resource group (GFS2 equivalent of a block group).
      
      If you are creating a set of directories, each of which will contain a
      job running on a different node, then by setting +T on the parent
      directory before creating the subdirectories, each will land up in a
      different resource group, and thus resource group contention between
      nodes will be kept to a minimum.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9dbe9610
    • S
      GFS2: Add test for resource group congestion status · bcd97c06
      Steven Whitehouse 提交于
      This patch uses information gathered by the recent glock statistics
      patch in order to derrive a boolean verdict on the congestion
      status of a resource group. This is then used when making decisions
      on which resource group to choose during block allocation.
      
      The aim is to avoid resource groups which are heavily contended
      by other nodes, while still ensuring locality of access wherever
      possible.
      
      Once a reservation has been made in a particular resource group
      we continue to use that resource group until a new reservation is
      required. This should help to ensure that we do not change resource
      groups too often.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bcd97c06
    • B
      GFS2: Speed up gfs2_rbm_from_block · a68a0a35
      Bob Peterson 提交于
      This patch is a rewrite of function gfs2_rbm_from_block. Rather than
      looping to find the right bitmap, the code now does a few simple
      math calculations.
      
      I compared the performance of both algorithms side by side and the new
      algorithm is noticeably faster. Sample instrumentation output from a
      "fast" machine:
      
      5 million calls: millisec spent: Orig: 166 New: 113
      5 million calls: millisec spent: Orig: 189 New: 114
      
      In addition, I ran postmark (on a somewhat slowr CPU) before the after
      the new algorithm was put in place and postmark showed a decent
      improvement:
      
      Before the new algorithm:
      -------------------------
      Time:
      	645 seconds total
      	584 seconds of transactions (171 per second)
      
      Files:
      	150087 created (232 per second)
      		Creation alone: 100000 files (2083 per second)
      		Mixed with transactions: 50087 files (85 per second)
      	49995 read (85 per second)
      	49991 appended (85 per second)
      	150087 deleted (232 per second)
      		Deletion alone: 100174 files (7705 per second)
      		Mixed with transactions: 49913 files (85 per second)
      
      Data:
      	273.42 megabytes read (434.08 kilobytes per second)
      	852.13 megabytes written (1.32 megabytes per second)
      
      With the new algorithm:
      -----------------------
      Time:
      	599 seconds total
      	530 seconds of transactions (188 per second)
      
      Files:
      	150087 created (250 per second)
      		Creation alone: 100000 files (1886 per second)
      		Mixed with transactions: 50087 files (94 per second)
      	49995 read (94 per second)
      	49991 appended (94 per second)
      	150087 deleted (250 per second)
      		Deletion alone: 100174 files (6260 per second)
      		Mixed with transactions: 49913 files (94 per second)
      
      Data:
      	273.42 megabytes read (467.42 kilobytes per second)
      	852.13 megabytes written (1.42 megabytes per second)
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a68a0a35
    • L
      GFS2: Fix FITRIM argument handling · 076f0faa
      Lukas Czerner 提交于
      Currently implementation in gfs2 uses FITRIM arguments as it were in
      file system blocks units which is wrong. The FITRIM arguments
      (fstrim_range.start, fstrim_range.len and fstrim_range.minlen) are
      actually in bytes.
      
      Moreover, check for start argument beyond the end of file system, len
      argument being smaller than file system block and minlen argument being
      bigger than biggest resource group were missing.
      
      This commit converts the code to convert FITRIM argument to file system
      blocks and also adds appropriate checks mentioned above.
      
      All the problems were recognised by xfstests 251 and 260.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      076f0faa
    • L
      GFS2: Require user to provide argument for FITRIM · 3a238ade
      Lukas Czerner 提交于
      When the fstrim_range argument is not provided by user in FITRIM ioctl
      we should just return EFAULT and not promoting bad behaviour by filling
      the structure in kernel. Let the user deal with it.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3a238ade
    • A
      GFS2: Fix possible null pointer deref in gfs2_rs_alloc · cd0ed19f
      Andrew Price 提交于
      Despite the return value from kmem_cache_zalloc() being checked, the
      error wasn't being returned until after a possible null pointer
      dereference. This patch returns the error immediately, allowing the
      removal of the error variable.
      Signed-off-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      cd0ed19f
  17. 24 9月, 2012 13 次提交