1. 28 6月, 2013 1 次提交
  2. 03 6月, 2013 1 次提交
  3. 08 4月, 2013 1 次提交
  4. 13 2月, 2013 1 次提交
  5. 02 2月, 2013 1 次提交
  6. 29 1月, 2013 2 次提交
    • S
      GFS2: Use ->writepages for ordered writes · 45138990
      Steven Whitehouse 提交于
      Instead of using a list of buffers to write ahead of the journal
      flush, this now uses a list of inodes and calls ->writepages
      via filemap_fdatawrite() in order to achieve the same thing. For
      most use cases this results in a shorter ordered write list,
      as well as much larger i/os being issued.
      
      The ordered write list is sorted by inode number before writing
      in order to retain the disk block ordering between inodes as
      per the previous code.
      
      The previous ordered write code used to conflict in its assumptions
      about how to write out the disk blocks with mpage_writepages()
      so that with this updated version we can also use mpage_writepages()
      for GFS2's ordered write, writepages implementation. So we will
      also send larger i/os from writeback too.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      45138990
    • S
      GFS2: Split gfs2_trans_add_bh() into two · 350a9b0a
      Steven Whitehouse 提交于
      There is little common content in gfs2_trans_add_bh() between the data
      and meta classes by the time that the functions which it calls are
      taken into account. The intent here is to split this into two
      separate functions. Stage one is to introduce gfs2_trans_add_data()
      and gfs2_trans_add_meta() and update the callers accordingly.
      
      Later patches will then pull in the content of gfs2_trans_add_bh()
      and its dependent functions in order to clean up the code in this
      area.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      350a9b0a
  7. 13 11月, 2012 1 次提交
  8. 07 11月, 2012 1 次提交
    • S
      GFS2: Add Orlov allocator · 9dbe9610
      Steven Whitehouse 提交于
      Just like ext3, this works on the root directory and any directory
      with the +T flag set. Also, just like ext3, any subdirectory created
      in one of the just mentioned cases will be allocated to a random
      resource group (GFS2 equivalent of a block group).
      
      If you are creating a set of directories, each of which will contain a
      job running on a different node, then by setting +T on the parent
      directory before creating the subdirectories, each will land up in a
      different resource group, and thus resource group contention between
      nodes will be kept to a minimum.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9dbe9610
  9. 24 9月, 2012 1 次提交
    • S
      GFS2: Add structure to contain rgrp, bitmap, offset tuple · 4a993fb1
      Steven Whitehouse 提交于
      This patch introduces a new structure, gfs2_rbm, which is a
      tuple of a resource group, a bitmap within the resource group
      and an offset within that bitmap. This is designed to make
      manipulating these sets of variables easier. There is also a
      new helper function which converts this representation back
      to a disk block address.
      
      In addition, the rbtree nodes which are used for the reservations
      were not being correctly initialised, which is now fixed. Also,
      the tracing was not passing through the inode where it should
      have been. That is mostly fixed aside from one corner case. This
      needs to be revisited since there can also be a NULL rgrp in
      some cases which results in the device being incorrect in the
      trace.
      
      This is intended to be the first step towards cleaning up some
      of the allocation code, and some further bug fixes.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4a993fb1
  10. 19 7月, 2012 1 次提交
    • B
      GFS2: Reduce file fragmentation · 8e2e0047
      Bob Peterson 提交于
      This patch reduces GFS2 file fragmentation by pre-reserving blocks. The
      resulting improved on disk layout greatly speeds up operations in cases
      which would have resulted in interlaced allocation of blocks previously.
      A typical example of this is 10 parallel dd processes, each writing to a
      file in a common dirctory.
      
      The implementation uses an rbtree of reservations attached to each
      resource group (and each inode).
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8e2e0047
  11. 06 6月, 2012 1 次提交
  12. 11 5月, 2012 1 次提交
  13. 24 4月, 2012 1 次提交
  14. 05 4月, 2012 1 次提交
  15. 20 3月, 2012 1 次提交
  16. 22 11月, 2011 1 次提交
  17. 21 11月, 2011 1 次提交
    • B
      GFS2: move toward a generic multi-block allocator · 6e87ed0f
      Bob Peterson 提交于
      This patch is a revision of the one I previously posted.
      I tried to integrate all the suggestions Steve gave.
      The purpose of the patch is to change function gfs2_alloc_block
      (allocate either a dinode block or an extent of data blocks)
      to a more generic gfs2_alloc_blocks function that can
      allocate both a dinode _and_ an extent of data blocks in the
      same call. This will ultimately help us create a multi-block
      reservation scheme to reduce file fragmentation.
      
      This patch moves more toward a generic multi-block allocator that
      takes a pointer to the number of data blocks to allocate, plus whether
      or not to allocate a dinode. In theory, it could be called to allocate
      (1) a single dinode block, (2) a group of one or more data blocks, or
      (3) a dinode plus several data blocks.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6e87ed0f
  18. 15 11月, 2011 1 次提交
  19. 08 11月, 2011 1 次提交
  20. 21 10月, 2011 6 次提交
    • S
      GFS2: Move readahead of metadata during deallocation into its own function · b99b98dc
      Steven Whitehouse 提交于
      Move the recently added readahead of the indirect pointer
      tree during deallocation into its own function in order
      that we can use it elsewhere in the future. Also this
      fixes the resetting of the "first" variable in the
      original patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b99b98dc
    • B
      GFS2: rewrite fallocate code to write blocks directly · 64dd153c
      Benjamin Marzinski 提交于
      GFS2's fallocate code currently goes through the page cache. Since it's only
      writing to the end of the file or to holes in it, it doesn't need to, and it
      was causing issues on low memory environments. This patch pulls in some of
      Steve's block allocation work, and uses it to simply allocate the blocks for
      the file, and zero them out at allocation time.  It provides a slight
      performance increase, and it dramatically simplifies the code.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      64dd153c
    • B
      GFS2: speed up delete/unlink performance for large files · bd5437a7
      Bob Peterson 提交于
      This patch improves the performance of delete/unlink
      operations in a GFS2 file system where the files are large
      by adding a layer of metadata read-ahead for indirect blocks.
      Mileage will vary, but on my system, deleting an 8.6G file
      dropped from 22 seconds to about 4.5 seconds.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bd5437a7
    • S
      GFS2: Use cached rgrp in gfs2_rlist_add() · 70b0c365
      Steven Whitehouse 提交于
      Each block which is deallocated, requires a call to gfs2_rlist_add()
      and each of those calls was calling gfs2_blk2rgrpd() in order to
      figure out which rgrp the block belonged in. This can be speeded up
      by making use of the rgrp cached in the inode. We also reset this
      cached rgrp in case the block has changed rgrp. This should provide
      a big reduction in gfs2_blk2rgrpd() calls during deallocation.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      70b0c365
    • S
      GFS2: Call do_strip() directly from recursive_scan() · d56fa8a1
      Steven Whitehouse 提交于
      The recursive_scan() function only ever takes a single "bc"
      argument, so we might as well just call do_strip() directly
      from resource_scan() rather than pass it in as an argument.
      
      Also the "data" argument is always a struct strip_mine, so
      we can pass that in, rather than using a void pointer.
      
      This also moves do_strip() ahead of recursive_scan() so that
      we don't need to add a prototype.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d56fa8a1
    • S
      GFS2: Make resource groups "append only" during life of fs · 8339ee54
      Steven Whitehouse 提交于
      Since we have ruled out supporting online filesystem shrink,
      it is possible to make the resource group list append only
      during the life of a super block. This gives several benefits:
      
      Firstly, we only need to read new rindex elements as they are added
      rather than needing to reread the whole rindex file each time one
      element is added.
      
      Secondly, the rindex glock can be held for much shorter periods of
      time, and is completely removed from the fast path for allocations.
      The lock is taken in shared mode only when updating the resource
      groups when the first allocation occurs, and after a grow has
      taken place.
      
      Thirdly, this results in a reduction in code size, and everything
      gets a lot simpler to understand in this area.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8339ee54
  21. 21 7月, 2011 1 次提交
  22. 15 7月, 2011 1 次提交
  23. 21 5月, 2011 1 次提交
    • S
      GFS2: Wipe directory hash table metadata when deallocating a directory · 6d3117b4
      Steven Whitehouse 提交于
      The deallocation code for directories in GFS2 is largely divided into
      two parts. The first part deallocates any directory leaf blocks and
      marks the directory as being a regular file when that is complete. The
      second stage was identical to deallocating regular files.
      
      Regular files have their data blocks in a different
      address space to directories, and thus what would have been normal data
      blocks in a regular file (the hash table in a GFS2 directory) were
      deallocated correctly. However, a reference to these blocks was left in the
      journal (assuming of course that some previous activity had resulted in
      those blocks being in the journal or ail list).
      
      This patch uses the i_depth as a test of whether the inode is an
      exhash directory (we cannot test the inode type as that has already
      been changed to a regular file at this stage in deallocation)
      
      The original issue was reported by Chris Hertel as an issue he encountered
      running bonnie++
      Reported-by: NChristopher R. Hertel <crh@samba.org>
      Cc: Abhijith Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6d3117b4
  24. 31 3月, 2011 1 次提交
  25. 24 2月, 2011 1 次提交
    • B
      GFS2: deallocation performance patch · 4c16c36a
      Bob Peterson 提交于
      This patch is a performance improvement to GFS2's dealloc code.
      Rather than update the quota file and statfs file for every
      single block that's stripped off in unlink function do_strip,
      this patch keeps track and updates them once for every layer
      that's stripped.  This is done entirely inside the existing
      transaction, so there should be no risk of corruption.
      The other functions that deallocate blocks will be unaffected
      because they are using wrapper functions that do the same
      thing that they do today.
      
      I tested this code on my roth cluster by creating 200
      files in a directory, each of which is 100MB, then on
      four nodes, I simultaneously deleted the files, thus competing
      for GFS2 resources (but different files).  The commands
      I used were:
      
      [root@roth-01]# time for i in `seq 1 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      [root@roth-02]# time for i in `seq 2 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      [root@roth-03]# time for i in `seq 3 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      [root@roth-05]# time for i in `seq 4 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      
      The performance increase was significant:
      
                   roth-01     roth-02     roth-03     roth-05
                   ---------   ---------   ---------   ---------
      old: real    0m34.027    0m25.021s   0m23.906s   0m35.646s
      new: real    0m22.379s   0m24.362s   0m24.133s   0m18.562s
      
      Total time spent deleting:
      old: 118.6s
      new:  89.4
      
      For this particular case, this showed a 25% performance increase for
      GFS2 unlinks.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4c16c36a
  26. 30 11月, 2010 2 次提交
  27. 28 9月, 2010 1 次提交
  28. 20 9月, 2010 2 次提交
    • S
      GFS2: Remove i_disksize · a2e0f799
      Steven Whitehouse 提交于
      With the update of the truncate code, ip->i_disksize and
      inode->i_size are merely copies of each other. This means
      we can remove ip->i_disksize and use inode->i_size exclusively
      reducing the size of a GFS2 inode by 8 bytes.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a2e0f799
    • S
      GFS2: New truncate sequence · ff8f33c8
      Steven Whitehouse 提交于
      This updates GFS2's truncate code to use the new truncate
      sequence correctly. This is a stepping stone to being
      able to remove ip->i_disksize in favour of using i_size
      everywhere now that the two sizes are always identical.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      ff8f33c8
  29. 30 7月, 2010 1 次提交
  30. 29 7月, 2010 1 次提交
  31. 15 7月, 2010 1 次提交
  32. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6