1. 15 11月, 2010 1 次提交
    • S
      GFS2: Fix inode deallocation race · 044b9414
      Steven Whitehouse 提交于
      This area of the code has always been a bit delicate due to the
      subtleties of lock ordering. The problem is that for "normal"
      alloc/dealloc, we always grab the inode locks first and the rgrp lock
      later.
      
      In order to ensure no races in looking up the unlinked, but still
      allocated inodes, we need to hold the rgrp lock when we do the lookup,
      which means that we can't take the inode glock.
      
      The solution is to borrow the technique already used by NFS to solve
      what is essentially the same problem (given an inode number, look up
      the inode carefully, checking that it really is in the expected
      state).
      
      We cannot do that directly from the allocation code (lock ordering
      again) so we give the job to the pre-existing delete workqueue and
      carry on with the allocation as normal.
      
      If we find there is no space, we do a journal flush (required anyway
      if space from a deallocation is to be released) which should block
      against the pending deallocations, so we should always get the space
      back.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      044b9414
  2. 20 9月, 2010 1 次提交
  3. 10 8月, 2010 2 次提交
    • A
      simplify checks for I_CLEAR/I_FREEING · a4ffdde6
      Al Viro 提交于
      add I_CLEAR instead of replacing I_FREEING with it.  I_CLEAR is
      equivalent to I_FREEING for almost all code looking at either;
      it's there to keep track of having called clear_inode() exactly
      once per inode lifetime, at some point after having set I_FREEING.
      I_CLEAR and I_FREEING never get set at the same time with the
      current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
      instead of I_CLEAR without loss of information.  As the result of
      such change, checks become simpler and the amount of code that needs
      to know about I_CLEAR shrinks a lot.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a4ffdde6
    • C
      remove inode_setattr · 1025774c
      Christoph Hellwig 提交于
      Replace inode_setattr with opencoded variants of it in all callers.  This
      moves the remaining call to vmtruncate into the filesystem methods where it
      can be replaced with the proper truncate sequence.
      
      In a few cases it was obvious that we would never end up calling vmtruncate
      so it was left out in the opencoded variant:
      
       spufs: explicitly checks for ATTR_SIZE earlier
       btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
       ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above
      
      In addition to that ncpfs called inode_setattr with handcrafted iattrs,
      which allowed to trim down the opencoded variant.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1025774c
  4. 15 7月, 2010 1 次提交
    • B
      GFS2: Fix kernel NULL pointer dereference by dlm_astd · b1becbde
      Bob Peterson 提交于
      This patch fixes a problem in an error path when looking
      up dinodes.  There are two sister-functions, gfs2_inode_lookup
      and gfs2_process_unlinked_inode.  Both functions acquire and
      hold the i_iopen glock for the dinode being looked up. The last
      thing they try to do is hold the i_gl glock for the dinode.
      If that glock fails for some reason, the error path was
      incorrectly calling gfs2_glock_put for the i_iopen glock twice.
      This resulted in the glock being prematurely freed.  The
      "minimum hold time" usually kept the glock in memory, but the
      lock interface to dlm (aka lock_dlm) freed its memory for the
      glock.  In some circumstances, it would cause dlm's dlm_astd daemon
      to try to call the bast function for the freed lock_dlm memory,
      which resulted in a NULL pointer dereference.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b1becbde
  5. 21 5月, 2010 1 次提交
  6. 14 4月, 2010 1 次提交
    • B
      GFS2: glock livelock · 1a0eae88
      Bob Peterson 提交于
      This patch fixes a couple gfs2 problems with the reclaiming of
      unlinked dinodes.  First, there were a couple of livelocks where
      everything would come to a halt waiting for a glock that was
      seemingly held by a process that no longer existed.  In fact, the
      process did exist, it just had the wrong pid number in the holder
      information.  Second, there was a lock ordering problem between
      inode locking and glock locking.  Third, glock/inode contention
      could sometimes cause inodes to be improperly marked invalid by
      iget_failed.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      1a0eae88
  7. 01 3月, 2010 1 次提交
    • S
      GFS2: Metadata address space clean up · 009d8518
      Steven Whitehouse 提交于
      Since the start of GFS2, an "extra" inode has been used to store
      the metadata belonging to each inode. The only reason for using
      this inode was to have an extra address space, the other fields
      were unused. This means that the memory usage was rather inefficient.
      
      The reason for keeping each inode's metadata in a separate address
      space is that when glocks are requested on remote nodes, we need to
      be able to efficiently locate the data and metadata which relating
      to that glock (inode) in order to sync or sync and invalidate it
      (depending on the remotely requested lock mode).
      
      This patch adds a new type of glock, which has in addition to
      its normal fields, has an address space. This applies to all
      inode and rgrp glocks (but to no other glock types which remain
      as before). As a result, we no longer need to have the second
      inode.
      
      This results in three major improvements:
       1. A saving of approx 25% of memory used in caching inodes
       2. A removal of the circular dependency between inodes and glocks
       3. No confusion between "normal" and "metadata" inodes in super.c
      
      Although the first of these is the more immediately apparent, the
      second is just as important as it now enables a number of clean
      ups at umount time. Those will be the subject of future patches.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      009d8518
  8. 18 12月, 2009 1 次提交
  9. 17 12月, 2009 1 次提交
    • C
      sanitize xattr handler prototypes · 431547b3
      Christoph Hellwig 提交于
      Add a flags argument to struct xattr_handler and pass it to all xattr
      handler methods.  This allows using the same methods for multiple
      handlers, e.g. for the ACL methods which perform exactly the same action
      for the access and default ACLs, just using a different underlying
      attribute.  With a little more groundwork it'll also allow sharing the
      methods for the regular user/trusted/secure handlers in extN, ocfs2 and
      jffs2 like it's already done for xfs in this patch.
      
      Also change the inode argument to the handlers to a dentry to allow
      using the handlers mechnism for filesystems that require it later,
      e.g. cifs.
      
      [with GFS2 bits updated by Steven Whitehouse <swhiteho@redhat.com>]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      431547b3
  10. 03 12月, 2009 2 次提交
    • S
      GFS2: Tag all metadata with jid · 0ab7d13f
      Steven Whitehouse 提交于
      There are two spare field in the header common to all GFS2
      metadata. One is just the right size to fit a journal id
      in it, and this patch updates the journal code so that each
      time a metadata block is modified, we tag it with the journal
      id of the node which is performing the modification.
      
      The reason for this is that it should make it much easier to
      debug issues which arise if we can tell which node was the
      last to modify a particular metadata block.
      
      Since the field is updated before the block is written into
      the journal, each journal should only contain metadata which
      is tagged with its own journal id. The one exception to this
      is the journal header block, which might have a different node's
      id in it, if that journal was recovered by another node in the
      cluster.
      
      Thus each journal will contain a record of which nodes recovered
      it, via the journal header.
      
      The other field in the metadata header could potentially be
      used to hold information about what kind of operation was
      performed, but for the time being we just zero it on each
      transaction so that if we use it for that in future, we'll
      know that the information (where it exists) is reliable.
      
      I did consider using the other field to hold the journal
      sequence number, however since in GFS2's journaling we write
      the modified data into the journal and not the original
      data, this gives no information as to what action caused the
      modification, so I think we can probably come up with a better
      use for those 64 bits in the future.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      0ab7d13f
    • S
      GFS2: Clean up ACLs · 479c427d
      Steven Whitehouse 提交于
      To prepare for support for caching of ACLs, this cleans up the GFS2
      ACL support by pushing the xattr code back into xattr.c and changing
      the acl_get function into one which only returns ACLs so that we
      can drop the caching function into it shortly.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      479c427d
  11. 27 8月, 2009 3 次提交
    • S
      GFS2: Remove no_formal_ino generating code · 8d8291ae
      Steven Whitehouse 提交于
      The inum structure used throughout GFS2 has two fields. One
      no_addr is the disk block number of the inode in question and
      is used everywhere as the inode number. The other, no_formal_ino,
      is used only as the generation number for NFS.
      
      Historically the no_formal_ino field was set using a complicated
      system of one global and one per-node file containing inode numbers
      in order to ensure that each no_formal_ino was unique. Also this
      code made no provision for what would happen when eventually the
      (64 bit) numbers ran out. Now I know that is pretty unlikely to
      happen given the large space of numbers, but it is possible
      nevertheless.
      
      The only guarantee required for no_formal_ino is that, for any
      single inode, the same number doesn't get reused too quickly.
      
      We already have a generation number which is kept in the inode
      and initialised from a counter in the resource group (almost
      no overhead, since we have to touch the resource group anyway
      in order to allocate an inode in the first place). Aside from
      ensuring that we never use the value 0 in the no_formal_ino
      field, we can use that counter directly.
      
      As a result of that change, we lose about 200 lines of code and
      also gain about 10 creates/sec on the postmark benchmark (on
      my test machine).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8d8291ae
    • S
      GFS2: Rename eattr.[ch] as xattr.[ch] · 307cf6e6
      Steven Whitehouse 提交于
      Use the more conventional name for the extended attribute
      support code. Update all the places which care.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      307cf6e6
    • S
      GFS2: Clean up of extended attribute support · 40b78a32
      Steven Whitehouse 提交于
      This has been on my list for some time. We need to change the way
      in which we handle extended attributes to allow faster file creation
      times (by reducing the number of transactions required) and the
      extended attribute code is the main obstacle to this.
      
      In addition to that, the VFS provides a way to demultiplex the xattr
      calls which we ought to be using, rather than rolling our own. This
      patch changes the GFS2 code to use that VFS feature and as a result
      the code shrinks by a couple of hundred lines or so, and becomes
      easier to read.
      
      I'm planning on doing further clean up work in this area, but this
      patch is a good start. The cleaned up code also uses the more usual
      "xattr" shorthand, I plan to eliminate the use of "eattr" eventually
      and in the mean time it serves as a flag as to which bits of the code
      have been updated.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      40b78a32
  12. 17 8月, 2009 1 次提交
  13. 22 5月, 2009 4 次提交
  14. 15 4月, 2009 1 次提交
  15. 24 3月, 2009 1 次提交
    • S
      GFS2: Merge lock_dlm module into GFS2 · f057f6cd
      Steven Whitehouse 提交于
      This is the big patch that I've been working on for some time
      now. There are many reasons for wanting to make this change
      such as:
       o Reducing overhead by eliminating duplicated fields between structures
       o Simplifcation of the code (reduces the code size by a fair bit)
       o The locking interface is now the DLM interface itself as proposed
         some time ago.
       o Fewer lookups of glocks when processing replies from the DLM
       o Fewer memory allocations/deallocations for each glock
       o Scope to do further optimisations in the future (but this patch is
         more than big enough for now!)
      
      Please note that (a) this patch relates to the lock_dlm module and
      not the DLM itself, that is still a separate module; and (b) that
      we retain the ability to build GFS2 as a standalone single node
      filesystem with out requiring the DLM.
      
      This patch needs a lot of testing, hence my keeping it I restarted
      my -git tree after the last merge window. That way, this has the maximum
      exposure before its merged. This is (modulo a few minor bug fixes) the
      same patch that I've been posting on and off the the last three months
      and its passed a number of different tests so far.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f057f6cd
  16. 05 1月, 2009 7 次提交
  17. 14 11月, 2008 1 次提交
  18. 18 9月, 2008 1 次提交
    • S
      GFS2: high time to take some time over atime · 719ee344
      Steven Whitehouse 提交于
      Until now, we've used the same scheme as GFS1 for atime. This has failed
      since atime is a per vfsmnt flag, not a per fs flag and as such the
      "noatime" flag was not getting passed down to the filesystems. This
      patch removes all the "special casing" around atime updates and we
      simply use the VFS's atime code.
      
      The net result is that GFS2 will now support all the same atime related
      mount options of any other filesystem on a per-vfsmnt basis. We do lose
      the "lazy atime" updates, but we gain "relatime". We could add lazy
      atime to the VFS at a later date, if there is a requirement for that
      variant still - I suspect relatime will be enough.
      
      Also we lose about 100 lines of code after this patch has been applied,
      and I have a suspicion that it will speed things up a bit, even when
      atime is "on". So it seems like a nice clean up as well.
      
      From a user perspective, everything stays the same except the loss of
      the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
      least, and to be honest I don't think anybody ever used it) and that a
      number of options which were ignored before now work correctly.
      
      Please let me know if you've got any comments. I'm pushing this out
      early so that you can all see what my plans are.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      719ee344
  19. 05 9月, 2008 1 次提交
  20. 27 8月, 2008 1 次提交
    • S
      GFS2: Fix & clean up GFS2 rename · 0188d6c5
      Steven Whitehouse 提交于
      This patch fixes a locking issue in the rename code by ensuring that we hold
      the per sb rename lock over both directory and "other" renames which involve
      different parent directories.
      
      At the same time, this moved the (only called from one place) function
      gfs2_ok_to_move into the file that its called from, so we can mark it
      static. This should make a code a bit easier to follow.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Peter Staubach <staubach@redhat.com>
      0188d6c5
  21. 27 7月, 2008 1 次提交
  22. 10 7月, 2008 1 次提交
  23. 03 7月, 2008 1 次提交
    • M
      [GFS2] don't call permission() · f58ba889
      Miklos Szeredi 提交于
      GFS2 calls permission() to verify permissions after locks on the files
      have been taken.
      
      For this it's sufficient to call gfs2_permission() instead.  This
      results in the following changes:
      
        - IS_RDONLY() check is not performed
        - IS_IMMUTABLE() check is not performed
        - devcgroup_inode_permission() is not called
        - security_inode_permission() is not called
      
      IS_RDONLY() should be unnecessary anyway, as the per-mount read-only
      flag should provide protection against read-only remounts during
      operations.  do_gfs2_set_flags() has been fixed to perform
      mnt_want_write()/mnt_drop_write() to protect against remounting
      read-only.
      
      IS_IMMUTABLE has been added to gfs2_permission()
      
      Repeating the security checks seems to be pointless, as they don't
      normally change, and if they do, it's independent of the filesystem
      state.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f58ba889
  24. 12 5月, 2008 1 次提交
  25. 10 4月, 2008 1 次提交
  26. 31 3月, 2008 2 次提交