1. 05 5月, 2011 1 次提交
    • S
      GFS2: Double check link count under glock · d192a8e5
      Steven Whitehouse 提交于
      To avoid any possible races relating to the link count, we need to
      recheck it under the inode's glock in all cases where it matters.
      Also to ensure we never get any nasty surprises, this patch also
      ensures that once the link count has hit zero it can never be
      elevated by rereading in data from disk.
      
      The only place we cannot provide a proper solution is in rename
      in the case where we are removing a target inode and we discover
      that the target inode has been already unlinked on another node.
      The race window is very small, and we return EAGAIN in this case
      to indicate what has happened. The proper solution would be to move
      the lookup parts of rename from the vfs into library calls which
      the fs could call directly, but that is potentially a very big job
      and this fix should cover most cases for now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d192a8e5
  2. 20 4月, 2011 2 次提交
    • S
      GFS2: Make writeback more responsive to system conditions · 4667a0ec
      Steven Whitehouse 提交于
      This patch adds writeback_control to writing back the AIL
      list. This means that we can then take advantage of the
      information we get in ->write_inode() in order to set off
      some pre-emptive writeback.
      
      In addition, the AIL code is cleaned up a bit to make it
      a bit simpler to understand.
      
      There is still more which can usefully be done in this area,
      but this is a good start at least.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4667a0ec
    • S
      GFS2: Optimise glock lru and end of life inodes · f42ab085
      Steven Whitehouse 提交于
      The GLF_LRU flag introduced in the previous patch can be
      used to check if a glock is on the lru list when a new
      holder is queued and if so remove it, without having first
      to get the lru_lock.
      
      The main purpose of this patch however is to optimise the
      glocks left over when an inode at end of life is being
      evicted. Previously such glocks were left with the GLF_LFLUSH
      flag set, so that when reclaimed, each one required a log flush.
      This patch resets the GLF_LFLUSH flag when there is nothing
      left to flush thus preventing later log flushes as glocks are
      reused or demoted.
      
      In order to do this, we need to keep track of the number of
      revokes which are outstanding, and also to clear the GLF_LFLUSH
      bit after a log commit when only revokes have been processed.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f42ab085
  3. 18 4月, 2011 1 次提交
    • B
      GFS2: filesystem hang caused by incorrect lock order · 44ad37d6
      Bob Peterson 提交于
      This patch fixes a deadlock in GFS2 where two processes are trying
      to reclaim an unlinked dinode:
      One holds the inode glock and calls gfs2_lookup_by_inum trying to look
      up the inode, which it can't, due to I_FREEING.  The other has set
      I_FREEING from vfs and is at the beginning of gfs2_delete_inode
      waiting for the glock, which is held by the first.  The solution is to
      add a new non_block parameter to the gfs2_iget function that causes it
      to return -ENOENT if the inode is being freed.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      44ad37d6
  4. 02 2月, 2011 1 次提交
    • E
      fs/vfs/security: pass last path component to LSM on inode creation · 2a7dba39
      Eric Paris 提交于
      SELinux would like to implement a new labeling behavior of newly created
      inodes.  We currently label new inodes based on the parent and the creating
      process.  This new behavior would also take into account the name of the
      new object when deciding the new label.  This is not the (supposed) full path,
      just the last component of the path.
      
      This is very useful because creating /etc/shadow is different than creating
      /etc/passwd but the kernel hooks are unable to differentiate these
      operations.  We currently require that userspace realize it is doing some
      difficult operation like that and than userspace jumps through SELinux hoops
      to get things set up correctly.  This patch does not implement new
      behavior, that is obviously contained in a seperate SELinux patch, but it
      does pass the needed name down to the correct LSM hook.  If no such name
      exists it is fine to pass NULL.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      2a7dba39
  5. 18 1月, 2011 1 次提交
    • S
      GFS2: Fix error path in gfs2_lookup_by_inum() · 24d9765f
      Steven Whitehouse 提交于
      In the (impossible, except if there is fs corruption) error path
      in gfs2_lookup_by_inum() if the call to gfs2_inode_refresh()
      fails, it was leaving the function by calling iput() rather
      than iget_failed(). This would cause future lookups of the same
      inode to block forever.
      
      This patch fixes the problem by moving the call to gfs2_inode_refresh()
      into gfs2_inode_lookup() where iget_failed() is part of the error path
      already. Also this cleans up some unreachable code and makes
      gfs2_set_iop() static.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      24d9765f
  6. 07 1月, 2011 1 次提交
  7. 30 11月, 2010 1 次提交
  8. 15 11月, 2010 1 次提交
    • S
      GFS2: Fix inode deallocation race · 044b9414
      Steven Whitehouse 提交于
      This area of the code has always been a bit delicate due to the
      subtleties of lock ordering. The problem is that for "normal"
      alloc/dealloc, we always grab the inode locks first and the rgrp lock
      later.
      
      In order to ensure no races in looking up the unlinked, but still
      allocated inodes, we need to hold the rgrp lock when we do the lookup,
      which means that we can't take the inode glock.
      
      The solution is to borrow the technique already used by NFS to solve
      what is essentially the same problem (given an inode number, look up
      the inode carefully, checking that it really is in the expected
      state).
      
      We cannot do that directly from the allocation code (lock ordering
      again) so we give the job to the pre-existing delete workqueue and
      carry on with the allocation as normal.
      
      If we find there is no space, we do a journal flush (required anyway
      if space from a deallocation is to be released) which should block
      against the pending deallocations, so we should always get the space
      back.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      044b9414
  9. 20 9月, 2010 1 次提交
  10. 10 8月, 2010 2 次提交
    • A
      simplify checks for I_CLEAR/I_FREEING · a4ffdde6
      Al Viro 提交于
      add I_CLEAR instead of replacing I_FREEING with it.  I_CLEAR is
      equivalent to I_FREEING for almost all code looking at either;
      it's there to keep track of having called clear_inode() exactly
      once per inode lifetime, at some point after having set I_FREEING.
      I_CLEAR and I_FREEING never get set at the same time with the
      current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
      instead of I_CLEAR without loss of information.  As the result of
      such change, checks become simpler and the amount of code that needs
      to know about I_CLEAR shrinks a lot.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a4ffdde6
    • C
      remove inode_setattr · 1025774c
      Christoph Hellwig 提交于
      Replace inode_setattr with opencoded variants of it in all callers.  This
      moves the remaining call to vmtruncate into the filesystem methods where it
      can be replaced with the proper truncate sequence.
      
      In a few cases it was obvious that we would never end up calling vmtruncate
      so it was left out in the opencoded variant:
      
       spufs: explicitly checks for ATTR_SIZE earlier
       btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
       ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above
      
      In addition to that ncpfs called inode_setattr with handcrafted iattrs,
      which allowed to trim down the opencoded variant.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1025774c
  11. 15 7月, 2010 1 次提交
    • B
      GFS2: Fix kernel NULL pointer dereference by dlm_astd · b1becbde
      Bob Peterson 提交于
      This patch fixes a problem in an error path when looking
      up dinodes.  There are two sister-functions, gfs2_inode_lookup
      and gfs2_process_unlinked_inode.  Both functions acquire and
      hold the i_iopen glock for the dinode being looked up. The last
      thing they try to do is hold the i_gl glock for the dinode.
      If that glock fails for some reason, the error path was
      incorrectly calling gfs2_glock_put for the i_iopen glock twice.
      This resulted in the glock being prematurely freed.  The
      "minimum hold time" usually kept the glock in memory, but the
      lock interface to dlm (aka lock_dlm) freed its memory for the
      glock.  In some circumstances, it would cause dlm's dlm_astd daemon
      to try to call the bast function for the freed lock_dlm memory,
      which resulted in a NULL pointer dereference.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b1becbde
  12. 21 5月, 2010 1 次提交
  13. 14 4月, 2010 1 次提交
    • B
      GFS2: glock livelock · 1a0eae88
      Bob Peterson 提交于
      This patch fixes a couple gfs2 problems with the reclaiming of
      unlinked dinodes.  First, there were a couple of livelocks where
      everything would come to a halt waiting for a glock that was
      seemingly held by a process that no longer existed.  In fact, the
      process did exist, it just had the wrong pid number in the holder
      information.  Second, there was a lock ordering problem between
      inode locking and glock locking.  Third, glock/inode contention
      could sometimes cause inodes to be improperly marked invalid by
      iget_failed.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      1a0eae88
  14. 01 3月, 2010 1 次提交
    • S
      GFS2: Metadata address space clean up · 009d8518
      Steven Whitehouse 提交于
      Since the start of GFS2, an "extra" inode has been used to store
      the metadata belonging to each inode. The only reason for using
      this inode was to have an extra address space, the other fields
      were unused. This means that the memory usage was rather inefficient.
      
      The reason for keeping each inode's metadata in a separate address
      space is that when glocks are requested on remote nodes, we need to
      be able to efficiently locate the data and metadata which relating
      to that glock (inode) in order to sync or sync and invalidate it
      (depending on the remotely requested lock mode).
      
      This patch adds a new type of glock, which has in addition to
      its normal fields, has an address space. This applies to all
      inode and rgrp glocks (but to no other glock types which remain
      as before). As a result, we no longer need to have the second
      inode.
      
      This results in three major improvements:
       1. A saving of approx 25% of memory used in caching inodes
       2. A removal of the circular dependency between inodes and glocks
       3. No confusion between "normal" and "metadata" inodes in super.c
      
      Although the first of these is the more immediately apparent, the
      second is just as important as it now enables a number of clean
      ups at umount time. Those will be the subject of future patches.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      009d8518
  15. 18 12月, 2009 1 次提交
  16. 17 12月, 2009 1 次提交
    • C
      sanitize xattr handler prototypes · 431547b3
      Christoph Hellwig 提交于
      Add a flags argument to struct xattr_handler and pass it to all xattr
      handler methods.  This allows using the same methods for multiple
      handlers, e.g. for the ACL methods which perform exactly the same action
      for the access and default ACLs, just using a different underlying
      attribute.  With a little more groundwork it'll also allow sharing the
      methods for the regular user/trusted/secure handlers in extN, ocfs2 and
      jffs2 like it's already done for xfs in this patch.
      
      Also change the inode argument to the handlers to a dentry to allow
      using the handlers mechnism for filesystems that require it later,
      e.g. cifs.
      
      [with GFS2 bits updated by Steven Whitehouse <swhiteho@redhat.com>]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      431547b3
  17. 03 12月, 2009 2 次提交
    • S
      GFS2: Tag all metadata with jid · 0ab7d13f
      Steven Whitehouse 提交于
      There are two spare field in the header common to all GFS2
      metadata. One is just the right size to fit a journal id
      in it, and this patch updates the journal code so that each
      time a metadata block is modified, we tag it with the journal
      id of the node which is performing the modification.
      
      The reason for this is that it should make it much easier to
      debug issues which arise if we can tell which node was the
      last to modify a particular metadata block.
      
      Since the field is updated before the block is written into
      the journal, each journal should only contain metadata which
      is tagged with its own journal id. The one exception to this
      is the journal header block, which might have a different node's
      id in it, if that journal was recovered by another node in the
      cluster.
      
      Thus each journal will contain a record of which nodes recovered
      it, via the journal header.
      
      The other field in the metadata header could potentially be
      used to hold information about what kind of operation was
      performed, but for the time being we just zero it on each
      transaction so that if we use it for that in future, we'll
      know that the information (where it exists) is reliable.
      
      I did consider using the other field to hold the journal
      sequence number, however since in GFS2's journaling we write
      the modified data into the journal and not the original
      data, this gives no information as to what action caused the
      modification, so I think we can probably come up with a better
      use for those 64 bits in the future.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      0ab7d13f
    • S
      GFS2: Clean up ACLs · 479c427d
      Steven Whitehouse 提交于
      To prepare for support for caching of ACLs, this cleans up the GFS2
      ACL support by pushing the xattr code back into xattr.c and changing
      the acl_get function into one which only returns ACLs so that we
      can drop the caching function into it shortly.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      479c427d
  18. 27 8月, 2009 3 次提交
    • S
      GFS2: Remove no_formal_ino generating code · 8d8291ae
      Steven Whitehouse 提交于
      The inum structure used throughout GFS2 has two fields. One
      no_addr is the disk block number of the inode in question and
      is used everywhere as the inode number. The other, no_formal_ino,
      is used only as the generation number for NFS.
      
      Historically the no_formal_ino field was set using a complicated
      system of one global and one per-node file containing inode numbers
      in order to ensure that each no_formal_ino was unique. Also this
      code made no provision for what would happen when eventually the
      (64 bit) numbers ran out. Now I know that is pretty unlikely to
      happen given the large space of numbers, but it is possible
      nevertheless.
      
      The only guarantee required for no_formal_ino is that, for any
      single inode, the same number doesn't get reused too quickly.
      
      We already have a generation number which is kept in the inode
      and initialised from a counter in the resource group (almost
      no overhead, since we have to touch the resource group anyway
      in order to allocate an inode in the first place). Aside from
      ensuring that we never use the value 0 in the no_formal_ino
      field, we can use that counter directly.
      
      As a result of that change, we lose about 200 lines of code and
      also gain about 10 creates/sec on the postmark benchmark (on
      my test machine).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8d8291ae
    • S
      GFS2: Rename eattr.[ch] as xattr.[ch] · 307cf6e6
      Steven Whitehouse 提交于
      Use the more conventional name for the extended attribute
      support code. Update all the places which care.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      307cf6e6
    • S
      GFS2: Clean up of extended attribute support · 40b78a32
      Steven Whitehouse 提交于
      This has been on my list for some time. We need to change the way
      in which we handle extended attributes to allow faster file creation
      times (by reducing the number of transactions required) and the
      extended attribute code is the main obstacle to this.
      
      In addition to that, the VFS provides a way to demultiplex the xattr
      calls which we ought to be using, rather than rolling our own. This
      patch changes the GFS2 code to use that VFS feature and as a result
      the code shrinks by a couple of hundred lines or so, and becomes
      easier to read.
      
      I'm planning on doing further clean up work in this area, but this
      patch is a good start. The cleaned up code also uses the more usual
      "xattr" shorthand, I plan to eliminate the use of "eattr" eventually
      and in the mean time it serves as a flag as to which bits of the code
      have been updated.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      40b78a32
  19. 17 8月, 2009 1 次提交
  20. 22 5月, 2009 4 次提交
  21. 15 4月, 2009 1 次提交
  22. 24 3月, 2009 1 次提交
    • S
      GFS2: Merge lock_dlm module into GFS2 · f057f6cd
      Steven Whitehouse 提交于
      This is the big patch that I've been working on for some time
      now. There are many reasons for wanting to make this change
      such as:
       o Reducing overhead by eliminating duplicated fields between structures
       o Simplifcation of the code (reduces the code size by a fair bit)
       o The locking interface is now the DLM interface itself as proposed
         some time ago.
       o Fewer lookups of glocks when processing replies from the DLM
       o Fewer memory allocations/deallocations for each glock
       o Scope to do further optimisations in the future (but this patch is
         more than big enough for now!)
      
      Please note that (a) this patch relates to the lock_dlm module and
      not the DLM itself, that is still a separate module; and (b) that
      we retain the ability to build GFS2 as a standalone single node
      filesystem with out requiring the DLM.
      
      This patch needs a lot of testing, hence my keeping it I restarted
      my -git tree after the last merge window. That way, this has the maximum
      exposure before its merged. This is (modulo a few minor bug fixes) the
      same patch that I've been posting on and off the the last three months
      and its passed a number of different tests so far.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f057f6cd
  23. 05 1月, 2009 7 次提交
  24. 14 11月, 2008 1 次提交
  25. 18 9月, 2008 1 次提交
    • S
      GFS2: high time to take some time over atime · 719ee344
      Steven Whitehouse 提交于
      Until now, we've used the same scheme as GFS1 for atime. This has failed
      since atime is a per vfsmnt flag, not a per fs flag and as such the
      "noatime" flag was not getting passed down to the filesystems. This
      patch removes all the "special casing" around atime updates and we
      simply use the VFS's atime code.
      
      The net result is that GFS2 will now support all the same atime related
      mount options of any other filesystem on a per-vfsmnt basis. We do lose
      the "lazy atime" updates, but we gain "relatime". We could add lazy
      atime to the VFS at a later date, if there is a requirement for that
      variant still - I suspect relatime will be enough.
      
      Also we lose about 100 lines of code after this patch has been applied,
      and I have a suspicion that it will speed things up a bit, even when
      atime is "on". So it seems like a nice clean up as well.
      
      From a user perspective, everything stays the same except the loss of
      the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
      least, and to be honest I don't think anybody ever used it) and that a
      number of options which were ignored before now work correctly.
      
      Please let me know if you've got any comments. I'm pushing this out
      early so that you can all see what my plans are.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      719ee344
  26. 05 9月, 2008 1 次提交