1. 06 6月, 2012 1 次提交
    • B
      GFS2: Extend the life of the reservations · 0a305e49
      Bob Peterson 提交于
      This patch lengthens the lifespan of the reservations structure for
      inodes. Before, they were allocated and deallocated for every write
      operation. With this patch, they are allocated when the first write
      occurs, and deallocated when the last process closes the file.
      It's more efficient to do it this way because it saves GFS2 a lot of
      unnecessary allocates and frees. It also gives us more flexibility
      for the future: (1) we can now fold the qadata structure back into
      the structure and save those alloc/frees, (2) we can use this for
      multi-block reservations.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      0a305e49
  2. 22 11月, 2011 1 次提交
  3. 21 10月, 2011 2 次提交
    • S
      GFS2: Cache the most recently used resource group in the inode · 54335b1f
      Steven Whitehouse 提交于
      This means that after the initial allocation for any inode, the
      last used resource group is cached in the inode for future use.
      This drastically reduces the number of lookups of resource
      groups in the common case, and this the contention on that
      data structure.
      
      The allocation algorithm is the same as previously, except that we
      always check to see if the goal block is within the cached rgrp
      first before going to the rbtree to look one up.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      54335b1f
    • B
      GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme · 7c9ca621
      Bob Peterson 提交于
      Here is an update of Bob's original rbtree patch which, in addition, also
      resolves the rather strange ref counting that was being done relating to
      the bitmap blocks.
      
      Originally we had a dual system for journaling resource groups. The metadata
      blocks were journaled and also the rgrp itself was added to a list. The reason
      for adding the rgrp to the list in the journal was so that the "repolish
      clones" code could be run to update the free space, and potentially send any
      discard requests when the log was flushed. This was done by comparing the
      "cloned" bitmap with what had been written back on disk during the transaction
      commit.
      
      Due to this, there was a requirement to hang on to the rgrps' bitmap buffers
      until the journal had been flushed. For that reason, there was a rather
      complicated set up in the ->go_lock ->go_unlock functions for rgrps involving
      both a mutex and a spinlock (the ->sd_rindex_spin) to maintain a reference
      count on the buffers.
      
      However, the journal maintains a reference count on the buffers anyway, since
      they are being journaled as metadata buffers. So by moving the code which deals
      with the post-journal accounting for bitmap blocks to the metadata journaling
      code, we can entirely dispense with the rather strange buffer ref counting
      scheme and also the requirement to journal the rgrps.
      
      The net result of all this is that the ->sd_rindex_spin is left to do exactly
      one job, and that is to look after the rbtree or rgrps.
      
      This patch is designed to be a stepping stone towards using RCU for the rbtree
      of resource groups, however the reduction in the number of uses of the
      ->sd_rindex_spin is likely to have benefits for multi-threaded workloads,
      anyway.
      
      The patch retains ->go_lock and ->go_unlock for rgrps, however these maybe also
      be removed in future in favour of calling the functions directly where required
      in the code. That will allow locking of resource groups without needing to
      actually read them in - something that could be useful in speeding up statfs.
      
      In the mean time though it is valid to dereference ->bi_bh only when the rgrp
      is locked. This is basically the same rule as before, modulo the references not
      being valid until the following journal flush.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Cc: Benjamin Marzinski <bmarzins@redhat.com>
      7c9ca621
  4. 28 9月, 2010 1 次提交
  5. 20 9月, 2010 1 次提交
    • B
      GFS2: fallocate support · 3921120e
      Benjamin Marzinski 提交于
      This patch adds support for fallocate to gfs2.  Since the gfs2 does not support
      uninitialized data blocks, it must write out zeros to all the blocks.  However,
      since it does not need to lock any pages to read from, gfs2 can write out the
      zero blocks much more efficiently.  On a moderately full filesystem, fallocate
      works around 5 times faster on average.  The fallocate call also allows gfs2 to
      add blocks to the file without changing the filesize, which will make it
      possible for gfs2 to preallocate space for the rindex file, so that gfs2 can
      grow a completely full filesystem.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3921120e
  6. 31 3月, 2008 1 次提交
    • S
      [GFS2] Update gfs2_trans_add_unrevoke to accept extents · 5731be53
      Steven Whitehouse 提交于
      By adding an extra argument to gfs2_trans_add_unrevoke we can now
      specify an extent length of blocks to unrevoke. This means that
      we only need to make one pass through the list for each extent
      rather than each block. Currently the only extent length which
      is used is 1, but that will change in the future.
      
      Also gfs2_trans_add_unrevoke is removed from gfs2_alloc_meta
      since its the only difference between this and gfs2_alloc_data
      which is left. This will allow a future patch to merge these
      two functions into one (i.e. one call to allocate both data
      and metadata in a single extent in the future).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5731be53
  7. 25 1月, 2008 1 次提交
    • S
      [GFS2] Don't add glocks to the journal · 2bcd610d
      Steven Whitehouse 提交于
      The only reason for adding glocks to the journal was to keep track
      of which locks required a log flush prior to release. We add a
      flag to the glock to allow this check to be made in a simpler way.
      
      This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64)
      and means that we can avoid extra work during the journal flush.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2bcd610d
  8. 10 10月, 2007 1 次提交
    • S
      [GFS2] Clean up gfs2_trans_add_revoke() · 1ad38c43
      Steven Whitehouse 提交于
      The following alters gfs2_trans_add_revoke() to take a struct
      gfs2_bufdata as an argument. This eliminates the memory allocation which
      was previously required by making use of the already existing struct
      gfs2_bufdata. It makes some sanity checks to ensure that the
      gfs2_bufdata has been removed from all the lists before its recycled as
      a revoke structure. This saves one memory allocation and one free per
      revoke structure.
      
      Also as a result, and to simplify the locking, since there is no longer
      any blocking code in gfs2_trans_add_revoke() we must hold the log lock
      whenever this function is called. This reduces the amount of times we
      take and unlock the log lock.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1ad38c43
  9. 05 9月, 2006 2 次提交
  10. 01 9月, 2006 1 次提交
    • S
      [GFS2] Update copyright, tidy up incore.h · e9fc2aa0
      Steven Whitehouse 提交于
      As per comments from Jan Engelhardt <jengelh@linux01.gwdg.de> this
      updates the copyright message to say "version" in full rather than
      "v.2". Also incore.h has been updated to remove forward structure
      declarations which are not required.
      
      The gfs2_quota_lvb structure has now had endianess annotations added
      to it. Also quota.c has been updated so that we now store the
      lvb data locally in endian independant format to avoid needing
      a structure in host endianess too. As a result the endianess
      conversions are done as required at various points and thus the
      conversion routines in lvb.[ch] are no longer required. I've
      moved the one remaining constant in lvb.h thats used into lm.h
      and removed the unused lvb.[ch].
      
      I have not changed the HIF_ constants. That is left to a later patch
      which I hope will unify the gh_flags and gh_iflags fields of the
      struct gfs2_holder.
      
      Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e9fc2aa0
  11. 15 6月, 2006 1 次提交
    • S
      [GFS2] Fix unlinked file handling · feaa7bba
      Steven Whitehouse 提交于
      This patch fixes the way we have been dealing with unlinked,
      but still open files. It removes all limits (other than memory
      for inodes, as per every other filesystem) on numbers of these
      which we can support on GFS2. It also means that (like other
      fs) its the responsibility of the last process to close the file
      to deallocate the storage, rather than the person who did the
      unlinking. Note that with GFS2, those two events might take place
      on different nodes.
      
      Also there are a number of other changes:
      
       o We use the Linux inode subsystem as it was intended to be
      used, wrt allocating GFS2 inodes
       o The Linux inode cache is now the point which we use for
      local enforcement of only holding one copy of the inode in
      core at once (previous to this we used the glock layer).
       o We no longer use the unlinked "special" file. We just ignore it
      completely. This makes unlinking more efficient.
       o We now use the 4th block allocation state. The previously unused
      state is used to track unlinked but still open inodes.
       o gfs2_inoded is no longer needed
       o Several fields are now no longer needed (and removed) from the in
      core struct gfs2_inode
       o Several fields are no longer needed (and removed) from the in core
      superblock
      
      There are a number of future possible optimisations and clean ups
      which have been made possible by this patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      feaa7bba
  12. 19 5月, 2006 1 次提交
  13. 30 3月, 2006 1 次提交
    • S
      [GFS2] Update debugging code · d0dc80db
      Steven Whitehouse 提交于
      Update the debugging code in trans.c and at the same time improve
      the debugging code for gfs2_holders. The new code should be pretty
      fast during the normal case and provide just as much information
      in case of errors (or more).
      
      One small function from glock.c has moved to glock.h as a static inline so
      that its return address won't get in the way of the debugging.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d0dc80db
  14. 08 2月, 2006 1 次提交
    • S
      [GFS2] Make journaled data files identical to normal files on disk · 18ec7d5c
      Steven Whitehouse 提交于
      This is a very large patch, with a few still to be resolved issues
      so you might want to check out the previous head of the tree since
      this is known to be unstable. Fixes for the various bugs will be
      forthcoming shortly.
      
      This patch removes the special data format which has been used
      up till now for journaled data files. Directories still retain the
      old format so that they will remain on disk compatible with earlier
      releases. As a result you can now do the following with journaled
      data files:
      
       1) mmap them
       2) export them over NFS
       3) convert to/from normal files whenever you want to (the zero length
          restriction is gone)
      
      In addition the level at which GFS' locking is done has changed for all
      files (since they all now use the page cache) such that the locking is
      done at the page cache level rather than the level of the fs operations.
      This should mean that things like loopback mounts and other things which
      touch the page cache directly should now work.
      
      Current known issues:
      
       1. There is a lock mode inversion problem related to the resource
          group hold function which needs to be resolved.
       2. Any significant amount of I/O causes an oops with an offset of hex 320
          (NULL pointer dereference) which appears to be related to a journaled data
          buffer appearing on a list where it shouldn't be.
       3. Direct I/O writes are disabled for the time being (will reappear later)
       4. There is probably a deadlock between the page lock and GFS' locks under
          certain combinations of mmap and fs operation I/O.
       5. Issue relating to ref counting on internally used inodes causes a hang
          on umount (discovered before this patch, and not fixed by it)
       6. One part of the directory metadata is different from GFS1 and will need
          to be resolved before next release.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      18ec7d5c
  15. 18 1月, 2006 1 次提交
  16. 17 1月, 2006 1 次提交