1. 15 7月, 2010 1 次提交
    • B
      GFS2: Fix kernel NULL pointer dereference by dlm_astd · b1becbde
      Bob Peterson 提交于
      This patch fixes a problem in an error path when looking
      up dinodes.  There are two sister-functions, gfs2_inode_lookup
      and gfs2_process_unlinked_inode.  Both functions acquire and
      hold the i_iopen glock for the dinode being looked up. The last
      thing they try to do is hold the i_gl glock for the dinode.
      If that glock fails for some reason, the error path was
      incorrectly calling gfs2_glock_put for the i_iopen glock twice.
      This resulted in the glock being prematurely freed.  The
      "minimum hold time" usually kept the glock in memory, but the
      lock interface to dlm (aka lock_dlm) freed its memory for the
      glock.  In some circumstances, it would cause dlm's dlm_astd daemon
      to try to call the bast function for the freed lock_dlm memory,
      which resulted in a NULL pointer dereference.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b1becbde
  2. 21 5月, 2010 1 次提交
  3. 14 4月, 2010 1 次提交
    • B
      GFS2: glock livelock · 1a0eae88
      Bob Peterson 提交于
      This patch fixes a couple gfs2 problems with the reclaiming of
      unlinked dinodes.  First, there were a couple of livelocks where
      everything would come to a halt waiting for a glock that was
      seemingly held by a process that no longer existed.  In fact, the
      process did exist, it just had the wrong pid number in the holder
      information.  Second, there was a lock ordering problem between
      inode locking and glock locking.  Third, glock/inode contention
      could sometimes cause inodes to be improperly marked invalid by
      iget_failed.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      1a0eae88
  4. 01 3月, 2010 1 次提交
    • S
      GFS2: Metadata address space clean up · 009d8518
      Steven Whitehouse 提交于
      Since the start of GFS2, an "extra" inode has been used to store
      the metadata belonging to each inode. The only reason for using
      this inode was to have an extra address space, the other fields
      were unused. This means that the memory usage was rather inefficient.
      
      The reason for keeping each inode's metadata in a separate address
      space is that when glocks are requested on remote nodes, we need to
      be able to efficiently locate the data and metadata which relating
      to that glock (inode) in order to sync or sync and invalidate it
      (depending on the remotely requested lock mode).
      
      This patch adds a new type of glock, which has in addition to
      its normal fields, has an address space. This applies to all
      inode and rgrp glocks (but to no other glock types which remain
      as before). As a result, we no longer need to have the second
      inode.
      
      This results in three major improvements:
       1. A saving of approx 25% of memory used in caching inodes
       2. A removal of the circular dependency between inodes and glocks
       3. No confusion between "normal" and "metadata" inodes in super.c
      
      Although the first of these is the more immediately apparent, the
      second is just as important as it now enables a number of clean
      ups at umount time. Those will be the subject of future patches.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      009d8518
  5. 18 12月, 2009 1 次提交
  6. 17 12月, 2009 1 次提交
    • C
      sanitize xattr handler prototypes · 431547b3
      Christoph Hellwig 提交于
      Add a flags argument to struct xattr_handler and pass it to all xattr
      handler methods.  This allows using the same methods for multiple
      handlers, e.g. for the ACL methods which perform exactly the same action
      for the access and default ACLs, just using a different underlying
      attribute.  With a little more groundwork it'll also allow sharing the
      methods for the regular user/trusted/secure handlers in extN, ocfs2 and
      jffs2 like it's already done for xfs in this patch.
      
      Also change the inode argument to the handlers to a dentry to allow
      using the handlers mechnism for filesystems that require it later,
      e.g. cifs.
      
      [with GFS2 bits updated by Steven Whitehouse <swhiteho@redhat.com>]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      431547b3
  7. 03 12月, 2009 2 次提交
    • S
      GFS2: Tag all metadata with jid · 0ab7d13f
      Steven Whitehouse 提交于
      There are two spare field in the header common to all GFS2
      metadata. One is just the right size to fit a journal id
      in it, and this patch updates the journal code so that each
      time a metadata block is modified, we tag it with the journal
      id of the node which is performing the modification.
      
      The reason for this is that it should make it much easier to
      debug issues which arise if we can tell which node was the
      last to modify a particular metadata block.
      
      Since the field is updated before the block is written into
      the journal, each journal should only contain metadata which
      is tagged with its own journal id. The one exception to this
      is the journal header block, which might have a different node's
      id in it, if that journal was recovered by another node in the
      cluster.
      
      Thus each journal will contain a record of which nodes recovered
      it, via the journal header.
      
      The other field in the metadata header could potentially be
      used to hold information about what kind of operation was
      performed, but for the time being we just zero it on each
      transaction so that if we use it for that in future, we'll
      know that the information (where it exists) is reliable.
      
      I did consider using the other field to hold the journal
      sequence number, however since in GFS2's journaling we write
      the modified data into the journal and not the original
      data, this gives no information as to what action caused the
      modification, so I think we can probably come up with a better
      use for those 64 bits in the future.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      0ab7d13f
    • S
      GFS2: Clean up ACLs · 479c427d
      Steven Whitehouse 提交于
      To prepare for support for caching of ACLs, this cleans up the GFS2
      ACL support by pushing the xattr code back into xattr.c and changing
      the acl_get function into one which only returns ACLs so that we
      can drop the caching function into it shortly.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      479c427d
  8. 27 8月, 2009 3 次提交
    • S
      GFS2: Remove no_formal_ino generating code · 8d8291ae
      Steven Whitehouse 提交于
      The inum structure used throughout GFS2 has two fields. One
      no_addr is the disk block number of the inode in question and
      is used everywhere as the inode number. The other, no_formal_ino,
      is used only as the generation number for NFS.
      
      Historically the no_formal_ino field was set using a complicated
      system of one global and one per-node file containing inode numbers
      in order to ensure that each no_formal_ino was unique. Also this
      code made no provision for what would happen when eventually the
      (64 bit) numbers ran out. Now I know that is pretty unlikely to
      happen given the large space of numbers, but it is possible
      nevertheless.
      
      The only guarantee required for no_formal_ino is that, for any
      single inode, the same number doesn't get reused too quickly.
      
      We already have a generation number which is kept in the inode
      and initialised from a counter in the resource group (almost
      no overhead, since we have to touch the resource group anyway
      in order to allocate an inode in the first place). Aside from
      ensuring that we never use the value 0 in the no_formal_ino
      field, we can use that counter directly.
      
      As a result of that change, we lose about 200 lines of code and
      also gain about 10 creates/sec on the postmark benchmark (on
      my test machine).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8d8291ae
    • S
      GFS2: Rename eattr.[ch] as xattr.[ch] · 307cf6e6
      Steven Whitehouse 提交于
      Use the more conventional name for the extended attribute
      support code. Update all the places which care.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      307cf6e6
    • S
      GFS2: Clean up of extended attribute support · 40b78a32
      Steven Whitehouse 提交于
      This has been on my list for some time. We need to change the way
      in which we handle extended attributes to allow faster file creation
      times (by reducing the number of transactions required) and the
      extended attribute code is the main obstacle to this.
      
      In addition to that, the VFS provides a way to demultiplex the xattr
      calls which we ought to be using, rather than rolling our own. This
      patch changes the GFS2 code to use that VFS feature and as a result
      the code shrinks by a couple of hundred lines or so, and becomes
      easier to read.
      
      I'm planning on doing further clean up work in this area, but this
      patch is a good start. The cleaned up code also uses the more usual
      "xattr" shorthand, I plan to eliminate the use of "eattr" eventually
      and in the mean time it serves as a flag as to which bits of the code
      have been updated.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      40b78a32
  9. 17 8月, 2009 1 次提交
  10. 22 5月, 2009 4 次提交
  11. 15 4月, 2009 1 次提交
  12. 24 3月, 2009 1 次提交
    • S
      GFS2: Merge lock_dlm module into GFS2 · f057f6cd
      Steven Whitehouse 提交于
      This is the big patch that I've been working on for some time
      now. There are many reasons for wanting to make this change
      such as:
       o Reducing overhead by eliminating duplicated fields between structures
       o Simplifcation of the code (reduces the code size by a fair bit)
       o The locking interface is now the DLM interface itself as proposed
         some time ago.
       o Fewer lookups of glocks when processing replies from the DLM
       o Fewer memory allocations/deallocations for each glock
       o Scope to do further optimisations in the future (but this patch is
         more than big enough for now!)
      
      Please note that (a) this patch relates to the lock_dlm module and
      not the DLM itself, that is still a separate module; and (b) that
      we retain the ability to build GFS2 as a standalone single node
      filesystem with out requiring the DLM.
      
      This patch needs a lot of testing, hence my keeping it I restarted
      my -git tree after the last merge window. That way, this has the maximum
      exposure before its merged. This is (modulo a few minor bug fixes) the
      same patch that I've been posting on and off the the last three months
      and its passed a number of different tests so far.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f057f6cd
  13. 05 1月, 2009 7 次提交
  14. 14 11月, 2008 1 次提交
  15. 18 9月, 2008 1 次提交
    • S
      GFS2: high time to take some time over atime · 719ee344
      Steven Whitehouse 提交于
      Until now, we've used the same scheme as GFS1 for atime. This has failed
      since atime is a per vfsmnt flag, not a per fs flag and as such the
      "noatime" flag was not getting passed down to the filesystems. This
      patch removes all the "special casing" around atime updates and we
      simply use the VFS's atime code.
      
      The net result is that GFS2 will now support all the same atime related
      mount options of any other filesystem on a per-vfsmnt basis. We do lose
      the "lazy atime" updates, but we gain "relatime". We could add lazy
      atime to the VFS at a later date, if there is a requirement for that
      variant still - I suspect relatime will be enough.
      
      Also we lose about 100 lines of code after this patch has been applied,
      and I have a suspicion that it will speed things up a bit, even when
      atime is "on". So it seems like a nice clean up as well.
      
      From a user perspective, everything stays the same except the loss of
      the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
      least, and to be honest I don't think anybody ever used it) and that a
      number of options which were ignored before now work correctly.
      
      Please let me know if you've got any comments. I'm pushing this out
      early so that you can all see what my plans are.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      719ee344
  16. 05 9月, 2008 1 次提交
  17. 27 8月, 2008 1 次提交
    • S
      GFS2: Fix & clean up GFS2 rename · 0188d6c5
      Steven Whitehouse 提交于
      This patch fixes a locking issue in the rename code by ensuring that we hold
      the per sb rename lock over both directory and "other" renames which involve
      different parent directories.
      
      At the same time, this moved the (only called from one place) function
      gfs2_ok_to_move into the file that its called from, so we can mark it
      static. This should make a code a bit easier to follow.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Peter Staubach <staubach@redhat.com>
      0188d6c5
  18. 27 7月, 2008 1 次提交
  19. 10 7月, 2008 1 次提交
  20. 03 7月, 2008 1 次提交
    • M
      [GFS2] don't call permission() · f58ba889
      Miklos Szeredi 提交于
      GFS2 calls permission() to verify permissions after locks on the files
      have been taken.
      
      For this it's sufficient to call gfs2_permission() instead.  This
      results in the following changes:
      
        - IS_RDONLY() check is not performed
        - IS_IMMUTABLE() check is not performed
        - devcgroup_inode_permission() is not called
        - security_inode_permission() is not called
      
      IS_RDONLY() should be unnecessary anyway, as the per-mount read-only
      flag should provide protection against read-only remounts during
      operations.  do_gfs2_set_flags() has been fixed to perform
      mnt_want_write()/mnt_drop_write() to protect against remounting
      read-only.
      
      IS_IMMUTABLE has been added to gfs2_permission()
      
      Repeating the security checks seems to be pointless, as they don't
      normally change, and if they do, it's independent of the filesystem
      state.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f58ba889
  21. 12 5月, 2008 1 次提交
  22. 10 4月, 2008 1 次提交
  23. 31 3月, 2008 6 次提交
    • C
      [GFS2] possible null pointer dereference fixup · 182fe5ab
      Cyrill Gorcunov 提交于
      gfs2_alloc_get may fail so we have to check it to prevent
      NULL pointer dereference.
      Signed-off-by: NCyrill Gorcunov <gorcunov@gamil.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      182fe5ab
    • D
      [GFS2] re-support special inode · 43a33c53
      Denis Cheng 提交于
      a previous commit removed call to
      init_special_inode from inode lookuping, this cause problems as:
      
       # mknod /mnt/gfs2/dev/null c 1 3
       # cat /mnt/gfs2/dev/null
       cat: /mnt/gfs2/dev/null: Invalid argument
      
      without special inode, GFS2 cannot support char device file,
      block device file, fifo pipe, and socket file, lose many important
      features as a common file system.
      
      this one line patch re add special inode support.
      Signed-off-by: NDenis Cheng <crquan@gmail.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      43a33c53
    • D
      [GFS2] remove gfs2_dev_iops · d83225d4
      Denis Cheng 提交于
      struct inode_operations gfs2_dev_iops is always the same as gfs2_file_iops,
      since Jan 2006, when GFS2 merged into mainstream kernel.
      
      So one of them could be removed.
      Signed-off-by: NDenis Cheng <crquan@gmail.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d83225d4
    • S
      [GFS2] Fix a page lock / glock deadlock · 7afd88d9
      Steven Whitehouse 提交于
      We've previously been using a "try lock" in readpage on the basis that
      it would prevent deadlocks due to the inverted lock ordering (our normal
      lock ordering is glock first and then page lock). Unfortunately tests
      have shown that this isn't enough. If the glock has a demote request
      queued such that run_queue() in the glock code tries to do a demote when
      its called under readpage then it will try and write out all the dirty
      pages which requires locking them. This then deadlocks with the page
      locked by readpage.
      
      The solution is to always require two calls into readpage. The first
      unlocks the page, gets the glock and returns AOP_TRUNCATED_PAGE, the
      second does the actual readpage and unlocks the glock & page as
      required.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7afd88d9
    • S
      [GFS2] Eliminate (almost) duplicate field from gfs2_inode · 77658aad
      Steven Whitehouse 提交于
      The blocks counter is almost a duplicate of the i_blocks
      field in the VFS inode. The only difference is that i_blocks
      can be only 32bits long for 32bit arch without large single file
      support. Since GFS2 doesn't handle the non-large single file
      case (for 32 bit anyway) this adds a new config dependency on
      64BIT || LSF. This has always been the case, however we've never
      explicitly said so before.
      
      Even if we do add support for the non-LSF case, we will still
      not require this field to be duplicated since we will not be
      able to access oversized files anyway.
      
      So the net result of all this is that we shave 8 bytes from a gfs2_inode
      and get our config deps correct.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      77658aad
    • S
      [GFS2] Reduce inode size by merging fields · ce276b06
      Steven Whitehouse 提交于
      There were three fields being used to keep track of the location
      of the most recently allocated block for each inode. These have
      been merged into a single field in order to better keep the
      data and metadata for an inode close on disk, and also to reduce
      the space required for storage.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ce276b06