1. 13 9月, 2014 1 次提交
    • A
      GFS2: fix d_splice_alias() misuses · cfb2f9d5
      Al Viro 提交于
      Callers of d_splice_alias(dentry, inode) don't need iput(), neither
      on success nor on failure.  Either the reference to inode is stored
      in a previously negative dentry, or it's dropped.  In either case
      inode reference the caller used to hold is consumed.
      
      __gfs2_lookup() does iput() in case when d_splice_alias() has failed.
      Double iput() if we ever hit that.  And gfs2_create_inode() ends up
      not only with double iput(), but with link count dropped to zero - on
      an inode it has just found in directory.
      
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      cfb2f9d5
  2. 11 9月, 2014 1 次提交
  3. 14 5月, 2014 1 次提交
    • B
      GFS2: remove transaction glock · 24972557
      Benjamin Marzinski 提交于
      GFS2 has a transaction glock, which must be grabbed for every
      transaction, whose purpose is to deal with freezing the filesystem.
      Aside from this involving a large amount of locking, it is very easy to
      make the current fsfreeze code hang on unfreezing.
      
      This patch rewrites how gfs2 handles freezing the filesystem. The
      transaction glock is removed. In it's place is a freeze glock, which is
      cached (but not held) in a shared state by every node in the cluster
      when the filesystem is mounted. This lock only needs to be grabbed on
      freezing, and actions which need to be safe from freezing, like
      recovery.
      
      When a node wants to freeze the filesystem, it grabs this glock
      exclusively.  When the freeze glock state changes on the nodes (either
      from shared to unlocked, or shared to exclusive), the filesystem does a
      special log flush.  gfs2_log_flush() does all the work for flushing out
      the and shutting down the incore log, and then it tries to grab the
      freeze glock in a shared state again.  Since the filesystem is stuck in
      gfs2_log_flush, no new transaction can start, and nothing can be written
      to disk. Unfreezing the filesytem simply involes dropping the freeze
      glock, allowing gfs2_log_flush() to grab and then release the shared
      lock, so it is cached for next time.
      
      However, in order for the unfreezing ioctl to occur, gfs2 needs to get a
      shared lock on the filesystem root directory inode to check permissions.
      If that glock has already been grabbed exclusively, fsfreeze will be
      unable to get the shared lock and unfreeze the filesystem.
      
      In order to allow the unfreeze, this patch makes gfs2 grab a shared lock
      on the filesystem root directory during the freeze, and hold it until it
      unfreezes the filesystem.  The functions which need to grab a shared
      lock in order to allow the unfreeze ioctl to be issued now use the lock
      grabbed by the freeze code instead.
      
      The freeze and unfreeze code take care to make sure that this shared
      lock will not be dropped while another process is using it.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      24972557
  4. 31 3月, 2014 1 次提交
  5. 19 3月, 2014 1 次提交
  6. 12 3月, 2014 1 次提交
  7. 04 2月, 2014 1 次提交
    • S
      GFS2: Allocate block for xattr at inode alloc time, if required · b2c8b3ea
      Steven Whitehouse 提交于
      This is another step towards improving the allocation of xattr
      blocks at inode allocation time. Here we take advantage of
      Christoph's recent work on ACLs to allocate a block for the
      xattrs early if we know that we will be adding ACLs to the
      inode later on. The advantage of that is that it is much
      more likely that we'll get a contiguous run of two blocks
      where the first is the inode and the second is the xattr block.
      
      We still have to fall back to the original system in case we
      don't get the requested two contiguous blocks, or in case the
      ACLs are too large to fit into the block.
      
      Future patches will move more of the ACL setting code further
      up the gfs2_inode_create() function. Also, I'd like to be
      able to do the same thing with the xattrs from LSMs in
      due course, too. That way we should be able to slowly reduce
      the number of independent transactions, at least in the
      most common cases.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b2c8b3ea
  8. 26 1月, 2014 1 次提交
  9. 18 1月, 2014 1 次提交
  10. 16 1月, 2014 1 次提交
    • S
      GFS2: Don't use ENOBUFS when ENOMEM is the correct error code · ac3beb6a
      Steven Whitehouse 提交于
      Al Viro has tactfully pointed out that we are using the incorrect
      error code in some cases. This patch fixes that, and also removes
      the (unused) return value for glock dumping.
      
      >        * gfs2_iget() - ENOBUFS instead of ENOMEM.  ENOBUFS is
      > "No buffer space available (POSIX.1 (XSI STREAMS option))" and since
      > we don't support STREAMS it's probably fair game, but... what the hell?
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      ac3beb6a
  11. 07 1月, 2014 1 次提交
  12. 06 1月, 2014 3 次提交
    • S
      GFS2: Remember directory insert point · 2b47dad8
      Steven Whitehouse 提交于
      When we look to see if there is enough space to add a dir
      entry without allocation, we have then been repeating the
      same search later when we do the actual insertion. This
      patch caches the details of the location in the gfs2_diradd
      structure, so that we do not have to repeat the search.
      
      This will provide a performance improvement which will be
      greater as the size of the directory increases.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2b47dad8
    • S
      GFS2: Consolidate transaction blocks calculation for dir add · 534cf9ca
      Steven Whitehouse 提交于
      There are three cases where we need to calculate the number of
      blocks to reserve in a transaction involving linking an inode
      into a directory. The one in rename is a bit more complicated,
      but the basis of it is the same as for link and create. So it
      makes sense to move this calculation into a single function
      rather than repeating it three times.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      534cf9ca
    • S
      GFS2: Add directory addition info structure · 3c1c0ae1
      Steven Whitehouse 提交于
      The intent is that this structure will hold the information
      required when adding entries to a directory (linking). To
      start with, it will contain only the number of blocks which
      are required to link the new entry into the directory. The
      current calculation returns either 0 or the maximim number of
      blocks that can ever be requested by such a transaction.
      
      The intent is that in a later patch, we can update the dir
      code to calculate this value more accurately. In addition
      further patches will also add further fields to the new
      structure to increase its utility.
      
      In addition this patch fixes a bug where the link used during
      inode creation was adding requesting too many blocks in
      some cases. This is harmless unless the fs is close to being
      full.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3c1c0ae1
  13. 22 11月, 2013 1 次提交
  14. 25 10月, 2013 1 次提交
  15. 02 10月, 2013 1 次提交
    • S
      GFS2: Add allocation parameters structure · 7b9cff46
      Steven Whitehouse 提交于
      This patch adds a structure to contain allocation parameters with
      the intention of future expansion of this structure. The idea is
      that we should be able to add more information about the allocation
      in the future in order to allow the allocator to make a better job
      of placing the requests on-disk.
      
      There is no functional difference from applying this patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7b9cff46
  16. 27 9月, 2013 1 次提交
    • S
      GFS2: Clean up reservation removal · af5c2697
      Steven Whitehouse 提交于
      The reservation for an inode should be cleared when it is truncated so
      that we can start again at a different offset for future allocations.
      We could try and do better than that, by resetting the search based on
      where the truncation started from, but this is only a first step.
      
      In addition, there are three callers of gfs2_rs_delete() but only one
      of those should really be testing the value of i_writecount. While
      we get away with that in the other cases currently, I think it would
      be better if we made that test specific to the one case which
      requires it.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      af5c2697
  17. 23 9月, 2013 1 次提交
  18. 17 9月, 2013 2 次提交
  19. 19 8月, 2013 2 次提交
  20. 22 7月, 2013 1 次提交
  21. 14 6月, 2013 1 次提交
    • S
      GFS2: Add atomic_open support · 6d4ade98
      Steven Whitehouse 提交于
      I've restricted atomic_open to only operate on regular files, although
      I still don't understand why atomic_open should not be possible also for
      directories on GFS2. That can always be added in later though, if it
      makes sense.
      
      The ->atomic_open function can be passed negative dentries, which
      in most cases means either ENOENT (->lookup) or a call to d_instantiate
      (->create). In the GFS2 case though, we need to actually perform the
      look up, since we do not know whether there has been a new inode created
      on another node. The look up calls d_splice_alias which then tries to
      rehash the dentry - so the solution here is to simply check for that
      in d_splice_alias. The same issue is likely to affect any other cluster
      filesystem implementing ->atomic_open
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "J. Bruce Fields" <bfields fieldses org>
      Cc: Jeff Layton <jlayton@redhat.com>
      6d4ade98
  22. 11 6月, 2013 1 次提交
    • S
      GFS2: Only do one directory search on create · 5a00f3cc
      Steven Whitehouse 提交于
      Creation of a new inode requires a directory search in order to ensure
      that we are not trying to create an inode with the same name as an
      existing one. This was hidden away inside the create_ok() function.
      
      In the case that there was an existing inode, and a lookup can be
      substituted for a create (which is the case with regular files
      when the O_EXCL flag is not in use) then we were doing a second
      lookup in order to return the inode.
      
      This patch merges these two lookups into one. This can be done by
      passing a flag to gfs2_dir_search() to tell it to just return -EEXIST
      in the cases where we don't actually want to look up the inode.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5a00f3cc
  23. 05 6月, 2013 1 次提交
  24. 03 6月, 2013 1 次提交
  25. 08 4月, 2013 3 次提交
    • S
      GFS2: Use gfs2_dinode_out() in the inode create path · 79ba7480
      Steven Whitehouse 提交于
      Over the previous two patches relating to inode creation, the
      content of init_dinode() has been looking more and more like
      gfs2_dinode_out(). This is not an accident! This patch replaces
      the parts of init_dinode() which are duplicated in gfs2_dinode_out()
      with a call to that function.
      
      Mostly that is straightforward, but there is one issue which needed
      to be resolved relating to the link count. The link count has to be
      set to zero in a certain error handling code path, which lands up
      calling iput(). This is now done specifically in that code path
      allowing the link count to be set earlier and written into the
      on disk inode by gfs2_dinode_put() in the normal way.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      79ba7480
    • S
      GFS2: Remove gfs2_refresh_inode from inode creation path · 28fb3027
      Steven Whitehouse 提交于
      The original method for creating inodes used in GFS2 was to fill
      out a buffer, with all the information, and then to read that
      buffer into the in-core inode, using gfs2_refresh_inode()
      
      The problem with this approach is that all the inode's fields
      need to be calculated ahead of time, and were stored in various
      variables making the code rather complicated.
      
      The new approach is simply to allocate the in-core inode earlier
      and fill in as many fields as possible ahead of time. These can
      then be used to initilise the on disk representation. The
      code has been working towards the point where it is possible
      to remove gfs2_refresh_inode() because all the fields are
      correctly initialised ahead of time. We've now reached that
      milestone, and have reversed the order of setting up the in
      core and on disk inodes.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      28fb3027
    • S
      GFS2: Clean up inode creation path · fd4b4e04
      Steven Whitehouse 提交于
      This patch cleans up the inode creation code path in GFS2. After the
      Orlov allocator was merged, a number of potential improvements are
      now possible, and this is a first set of these.
      
      The quota handling is now updated so that it matches the point in
      the code where the allocation takes place. This means that the one
      exception in gfs2_alloc_blocks relating to quota is now no longer
      required, and we can use the generic code everywhere.
      
      In addition the call to figure out whether we need to allocate any
      extra blocks in order to add a directory entry is moved higher up
      gfs2_create_inode. This means that if it returns an error, we
      can deal with that at a stage where it is easier to handle that case.
      The returned status cannot change during the function since we hold
      an exclusive lock on the directory.
      
      Two calls to gfs2_rindex_update have been changed to one, again at
      the top of gfs2_create_inode to simplify error handling.
      
      The time stamps are also now initialised earlier in the creation
      process, this is gradually moving towards being able to remove the
      call to gfs2_refresh_inode in gfs2_inode_create once we have all the
      fields covered.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fd4b4e04
  26. 13 2月, 2013 4 次提交
  27. 29 1月, 2013 1 次提交
    • S
      GFS2: Split gfs2_trans_add_bh() into two · 350a9b0a
      Steven Whitehouse 提交于
      There is little common content in gfs2_trans_add_bh() between the data
      and meta classes by the time that the functions which it calls are
      taken into account. The intent here is to split this into two
      separate functions. Stage one is to introduce gfs2_trans_add_data()
      and gfs2_trans_add_meta() and update the callers accordingly.
      
      Later patches will then pull in the content of gfs2_trans_add_bh()
      and its dependent functions in order to clean up the code in this
      area.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      350a9b0a
  28. 21 11月, 2012 1 次提交
    • B
      GFS2: Set gl_object during inode create · 1e2d9d44
      Bob Peterson 提交于
      This patch fixes a cluster coherency problem that occurs when one
      node creates a file, does several writes, then a different node
      tries to write to the same file. When the inode's glock is demoted,
      the inode wasn't synced to the media properly because the gl_object
      wasn't set. Later, the flush daemon noticed the uncommitted data
      and tried to flush it, only to discover the glock was no longer locked
      properly in exclusive mode. That caused an assert withdraw.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1e2d9d44
  29. 16 11月, 2012 1 次提交
  30. 13 11月, 2012 1 次提交
  31. 07 11月, 2012 1 次提交
    • S
      GFS2: Add Orlov allocator · 9dbe9610
      Steven Whitehouse 提交于
      Just like ext3, this works on the root directory and any directory
      with the +T flag set. Also, just like ext3, any subdirectory created
      in one of the just mentioned cases will be allocated to a random
      resource group (GFS2 equivalent of a block group).
      
      If you are creating a set of directories, each of which will contain a
      job running on a different node, then by setting +T on the parent
      directory before creating the subdirectories, each will land up in a
      different resource group, and thus resource group contention between
      nodes will be kept to a minimum.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9dbe9610