1. 21 7月, 2011 1 次提交
  2. 21 5月, 2011 1 次提交
    • S
      GFS2: Wipe directory hash table metadata when deallocating a directory · 6d3117b4
      Steven Whitehouse 提交于
      The deallocation code for directories in GFS2 is largely divided into
      two parts. The first part deallocates any directory leaf blocks and
      marks the directory as being a regular file when that is complete. The
      second stage was identical to deallocating regular files.
      
      Regular files have their data blocks in a different
      address space to directories, and thus what would have been normal data
      blocks in a regular file (the hash table in a GFS2 directory) were
      deallocated correctly. However, a reference to these blocks was left in the
      journal (assuming of course that some previous activity had resulted in
      those blocks being in the journal or ail list).
      
      This patch uses the i_depth as a test of whether the inode is an
      exhash directory (we cannot test the inode type as that has already
      been changed to a regular file at this stage in deallocation)
      
      The original issue was reported by Chris Hertel as an issue he encountered
      running bonnie++
      Reported-by: NChristopher R. Hertel <crh@samba.org>
      Cc: Abhijith Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6d3117b4
  3. 31 3月, 2011 1 次提交
  4. 24 2月, 2011 1 次提交
    • B
      GFS2: deallocation performance patch · 4c16c36a
      Bob Peterson 提交于
      This patch is a performance improvement to GFS2's dealloc code.
      Rather than update the quota file and statfs file for every
      single block that's stripped off in unlink function do_strip,
      this patch keeps track and updates them once for every layer
      that's stripped.  This is done entirely inside the existing
      transaction, so there should be no risk of corruption.
      The other functions that deallocate blocks will be unaffected
      because they are using wrapper functions that do the same
      thing that they do today.
      
      I tested this code on my roth cluster by creating 200
      files in a directory, each of which is 100MB, then on
      four nodes, I simultaneously deleted the files, thus competing
      for GFS2 resources (but different files).  The commands
      I used were:
      
      [root@roth-01]# time for i in `seq 1 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      [root@roth-02]# time for i in `seq 2 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      [root@roth-03]# time for i in `seq 3 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      [root@roth-05]# time for i in `seq 4 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      
      The performance increase was significant:
      
                   roth-01     roth-02     roth-03     roth-05
                   ---------   ---------   ---------   ---------
      old: real    0m34.027    0m25.021s   0m23.906s   0m35.646s
      new: real    0m22.379s   0m24.362s   0m24.133s   0m18.562s
      
      Total time spent deleting:
      old: 118.6s
      new:  89.4
      
      For this particular case, this showed a 25% performance increase for
      GFS2 unlinks.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4c16c36a
  5. 30 11月, 2010 2 次提交
  6. 28 9月, 2010 1 次提交
  7. 20 9月, 2010 2 次提交
    • S
      GFS2: Remove i_disksize · a2e0f799
      Steven Whitehouse 提交于
      With the update of the truncate code, ip->i_disksize and
      inode->i_size are merely copies of each other. This means
      we can remove ip->i_disksize and use inode->i_size exclusively
      reducing the size of a GFS2 inode by 8 bytes.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a2e0f799
    • S
      GFS2: New truncate sequence · ff8f33c8
      Steven Whitehouse 提交于
      This updates GFS2's truncate code to use the new truncate
      sequence correctly. This is a stepping stone to being
      able to remove ip->i_disksize in favour of using i_size
      everywhere now that the two sizes are always identical.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      ff8f33c8
  8. 30 7月, 2010 1 次提交
  9. 29 7月, 2010 1 次提交
  10. 15 7月, 2010 1 次提交
  11. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  12. 29 3月, 2010 1 次提交
    • S
      GFS2: Clean up stuffed file copying · 602c89d2
      Steven Whitehouse 提交于
      If the inode size was corrupt for stuffed files, it was possible
      for the copying of data to overrun the block and/or page. This patch
      checks for that condition so that this is no longer possible.
      
      This is also preparation for the new truncate sequence patch which
      requires the ability to have stuffed files with larger sizes than
      (disk block size - sizeof(on disk inode)) with the restriction that
      only the initial part of the file may be non-zero.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      602c89d2
  13. 12 2月, 2010 1 次提交
    • S
      GFS2: Fix bmap allocation corner-case bug · 07ccb7bf
      Steven Whitehouse 提交于
      This patch solves a corner case during allocation which occurs if both
      metadata (indirect) and data blocks are required but there is an
      obstacle in the filesystem (e.g. a resource group header or another
      allocated block) such that when the allocation is requested only
      enough blocks for the metadata are returned.
      
      By changing the exit condition of this loop, we ensure that a
      minimum of one data block will always be returned.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      07ccb7bf
  14. 12 6月, 2009 1 次提交
    • S
      GFS2: Add tracepoints · 63997775
      Steven Whitehouse 提交于
      This patch adds the ability to trace various aspects of the GFS2
      filesystem. The trace points are divided into three groups,
      glocks, logging and bmap. These points have been chosen because
      they allow inspection of the major internal functions of GFS2
      and they are also generic enough that they are unlikely to need
      any major changes as the filesystem evolves.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      63997775
  15. 10 6月, 2009 1 次提交
  16. 22 5月, 2009 1 次提交
    • S
      GFS2: Clean up some file names · b1e71b06
      Steven Whitehouse 提交于
      This patch renames the ops_*.c files which have no counterpart
      without the ops_ prefix in order to shorten the name and make
      it more readable. In addition, ops_address.h (which was very
      small) is moved into inode.h and inode.h is cleaned up by
      adding extern where required.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b1e71b06
  17. 20 5月, 2009 1 次提交
    • S
      GFS2: Improve resource group error handling · 09010978
      Steven Whitehouse 提交于
      This patch improves the error handling in the case where we
      discover that the summary information in the resource group
      doesn't match the bitmap information while in the process of
      allocating blocks. Originally this resulted in a kernel bug,
      but this patch changes that so that we return -EIO and print
      some messages explaining what went wrong, and how to fix it.
      
      We also remember locally not to try and allocate from the
      same rgrp again, so that a subsequent allocation in a
      different rgrp should succeed.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      09010978
  18. 24 3月, 2009 1 次提交
    • S
      GFS2: Merge lock_dlm module into GFS2 · f057f6cd
      Steven Whitehouse 提交于
      This is the big patch that I've been working on for some time
      now. There are many reasons for wanting to make this change
      such as:
       o Reducing overhead by eliminating duplicated fields between structures
       o Simplifcation of the code (reduces the code size by a fair bit)
       o The locking interface is now the DLM interface itself as proposed
         some time ago.
       o Fewer lookups of glocks when processing replies from the DLM
       o Fewer memory allocations/deallocations for each glock
       o Scope to do further optimisations in the future (but this patch is
         more than big enough for now!)
      
      Please note that (a) this patch relates to the lock_dlm module and
      not the DLM itself, that is still a separate module; and (b) that
      we retain the ability to build GFS2 as a standalone single node
      filesystem with out requiring the DLM.
      
      This patch needs a lot of testing, hence my keeping it I restarted
      my -git tree after the last merge window. That way, this has the maximum
      exposure before its merged. This is (modulo a few minor bug fixes) the
      same patch that I've been posting on and off the the last three months
      and its passed a number of different tests so far.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f057f6cd
  19. 05 1月, 2009 3 次提交
  20. 25 6月, 2008 1 次提交
    • B
      [GFS2] fix gfs2 block allocation (cleaned up) · 5af4e7a0
      Benjamin Marzinski 提交于
      This patch fixes bz 450641.
      
      This patch changes the computation for zero_metapath_length(), which it
      renames to metapath_branch_start(). When you are extending the metadata
      tree, The indirect blocks that point to the new data block must either
      diverge from the existing tree either at the inode, or at the first
      indirect block. They can diverge at the first indirect block because the
      inode has room for 483 pointers while the indirect blocks have room for
      509 pointers, so when the tree is grown, there is some free space in the
      first indirect block. What metapath_branch_start() now computes is the
      height where the first indirect block for the new data block is located.
      It can either be 1 (if the indirect block diverges from the inode) or 2
      (if it diverges from the first indirect block).
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5af4e7a0
  21. 31 3月, 2008 16 次提交
    • S
      [GFS2] Streamline quota lock/check for no-quota case · d82661d9
      Steven Whitehouse 提交于
      This patch streamlines the quota checking in the "no quota" case by
      making the check inline in the calling function, thus reducing the
      number of function calls. Eventually we might be able to remove the
      checks from the gfs2_quota_lock() and gfs2_quota_check() functions, but
      currently we can't as there are a very few places in the code which need
      to call these functions directly still.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Abhijith Das <adas@redhat.com>
      d82661d9
    • C
      [GFS2] possible null pointer dereference fixup · 182fe5ab
      Cyrill Gorcunov 提交于
      gfs2_alloc_get may fail so we have to check it to prevent
      NULL pointer dereference.
      Signed-off-by: NCyrill Gorcunov <gorcunov@gamil.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      182fe5ab
    • S
      [GFS2] Allow bmap to allocate extents · 9b8c81d1
      Steven Whitehouse 提交于
      We've supported mapping of extents when no block allocation is required
      for some time. This patch extends that to mapping of extents when an
      allocation has been requested. In that case we try to allocate as many
      blocks as are requested, but we might return fewer in case there is
      something preventing us from returning the complete amount (e.g. an
      already allocated block is in the way).
      
      Currently the only code path which can actually request multiple data
      blocks in a single bmap call is the page_mkwrite path and even then it
      only happens if there are multiple blocks per page. What this patch does
      do however, is merge the allocation requests for metadata (growing the
      metadata tree in either height or depth) with the allocation of the data
      blocks in the case that both are needed. This results in lower overheads
      even in the single block allocation case.
      
      The one thing which we can't handle here at the moment is unstuffing. I
      would like to be able to do that, but the problem which arises is that
      in order to unstuff one has to get a locked page from the page cache
      which results in locking problems in the (usual) case that the caller is
      holding the page lock on the page it wishes to map. So that case will
      have to be addressed in future patches.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9b8c81d1
    • S
      [GFS2] Get inode buffer only once per block map call · e23159d2
      Steven Whitehouse 提交于
      In the case that we needed to grow the height of the metadata tree
      we were looking up the inode buffer and then brelse()ing it despite
      the fact that it is needed later in the block map process.
      
      This patch ensures that we look up the inode's buffer once and only
      once during the block map process.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e23159d2
    • S
      [GFS2] Eliminate (almost) duplicate field from gfs2_inode · 77658aad
      Steven Whitehouse 提交于
      The blocks counter is almost a duplicate of the i_blocks
      field in the VFS inode. The only difference is that i_blocks
      can be only 32bits long for 32bit arch without large single file
      support. Since GFS2 doesn't handle the non-large single file
      case (for 32 bit anyway) this adds a new config dependency on
      64BIT || LSF. This has always been the case, however we've never
      explicitly said so before.
      
      Even if we do add support for the non-LSF case, we will still
      not require this field to be duplicated since we will not be
      able to access oversized files anyway.
      
      So the net result of all this is that we shave 8 bytes from a gfs2_inode
      and get our config deps correct.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      77658aad
    • S
      [GFS2] Add a function to interate over an extent · 30cbf189
      Steven Whitehouse 提交于
      This adds a function (currently the only use is during mapping
      of already allocated blocks, but watch this space) which iterates
      over a number of pointers in a block and returns the extent length.
      
      If the initial pointer is 0 (i.e. unallocated) it will return the
      number of unallocated blocks in the extent. If the initial pointer
      is allocated, then it returns the number of contiguously allocated
      blocks in the extent.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      30cbf189
    • S
      [GFS2] The case of the missing asterisk · c85a665f
      Steven Whitehouse 提交于
      A dereference was forgotten. This adds it back correctly.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c85a665f
    • S
      [GFS2] Add extent allocation to block allocator · b45e41d7
      Steven Whitehouse 提交于
      Rather than having to allocate a single block at a time, this patch
      allows the block allocator to allocate an extent. Since there is
      no difference (so far as the block allocator is concerned) between
      data blocks and indirect blocks, it is posible to allocate a single
      extent and for the caller to unrevoke just the blocks required
      for indirect blocks.
      
      Currently the only bit of GFS2 to make use of this feature is the
      build height function. The intention is that gfs2_block_map will
      be changed to make use of this feature in future patches.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b45e41d7
    • S
      [GFS2] Merge gfs2_alloc_meta and gfs2_alloc_data · 1639431a
      Steven Whitehouse 提交于
      Thanks to the preceeding patches, the only difference between
      these two functions is their name. We can thus merge them
      and call the new function gfs2_alloc_block to reflect the
      fact that it can allocate either kind of block.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1639431a
    • S
      [GFS2] Update gfs2_trans_add_unrevoke to accept extents · 5731be53
      Steven Whitehouse 提交于
      By adding an extra argument to gfs2_trans_add_unrevoke we can now
      specify an extent length of blocks to unrevoke. This means that
      we only need to make one pass through the list for each extent
      rather than each block. Currently the only extent length which
      is used is 1, but that will change in the future.
      
      Also gfs2_trans_add_unrevoke is removed from gfs2_alloc_meta
      since its the only difference between this and gfs2_alloc_data
      which is left. This will allow a future patch to merge these
      two functions into one (i.e. one call to allocate both data
      and metadata in a single extent in the future).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5731be53
    • S
      [GFS2] Reduce inode size by merging fields · ce276b06
      Steven Whitehouse 提交于
      There were three fields being used to keep track of the location
      of the most recently allocated block for each inode. These have
      been merged into a single field in order to better keep the
      data and metadata for an inode close on disk, and also to reduce
      the space required for storage.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ce276b06
    • S
      [GFS2] Introduce array of buffers to struct metapath · dbac6710
      Steven Whitehouse 提交于
      The reason for doing this is to allow all the block mapping code
      to share the same array. As a result we can remove two arguments
      from lookup_metapath since they are now returned via the array.
      
      We also add a function to drop all refs to buffer heads when we
      are done with the metapath. The build_height function shares the
      struct metapath, but currently still frees its own buffers, and
      this will change in a future patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      dbac6710
    • S
      [GFS2] Move part of gfs2_block_map into a separate function · 11707ea0
      Steven Whitehouse 提交于
      This is required to enable future changes to the block
      mapping code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      11707ea0
    • B
      [GFS2] Misc fixups · 7eabb77e
      Bob Peterson 提交于
      This patch contains two small fixups that didn't fit elsewhere.
      They are: (1) get rid of temp variable in find_metapath.
      (2) Remove vestigial "ret" variable from gfs2_writepage_common.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7eabb77e
    • B
      [GFS2] Get rid of unneeded parameter in gfs2_rlist_alloc · fe6c991c
      Bob Peterson 提交于
      This patch removed the unnecessary parameter from function
      gfs2_rlist_alloc.  The parameter was always passed in as 0.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fe6c991c
    • S
      [GFS2] Streamline indirect pointer tree height calculation · ecc30c79
      Steven Whitehouse 提交于
      This patch improves the calculation of the tree height in order to reduce
      the number of operations which are carried out on each call to gfs2_block_map.
      In the common case, we now make a single comparison, rather than calculating
      the required tree height from scratch each time. Also in the case that the
      tree does need some extra height, we start from the current height rather from
      zero when we work out what the new height ought to be.
      
      In addition the di_height field is moved into the inode proper and reduced
      in size to a u8 since the value must be between 0 and GFS2_MAX_META_HEIGHT (10).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ecc30c79