1. 18 9月, 2008 1 次提交
    • S
      GFS2: high time to take some time over atime · 719ee344
      Steven Whitehouse 提交于
      Until now, we've used the same scheme as GFS1 for atime. This has failed
      since atime is a per vfsmnt flag, not a per fs flag and as such the
      "noatime" flag was not getting passed down to the filesystems. This
      patch removes all the "special casing" around atime updates and we
      simply use the VFS's atime code.
      
      The net result is that GFS2 will now support all the same atime related
      mount options of any other filesystem on a per-vfsmnt basis. We do lose
      the "lazy atime" updates, but we gain "relatime". We could add lazy
      atime to the VFS at a later date, if there is a requirement for that
      variant still - I suspect relatime will be enough.
      
      Also we lose about 100 lines of code after this patch has been applied,
      and I have a suspicion that it will speed things up a bit, even when
      atime is "on". So it seems like a nice clean up as well.
      
      From a user perspective, everything stays the same except the loss of
      the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
      least, and to be honest I don't think anybody ever used it) and that a
      number of options which were ignored before now work correctly.
      
      Please let me know if you've got any comments. I'm pushing this out
      early so that you can all see what my plans are.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      719ee344
  2. 05 9月, 2008 1 次提交
  3. 27 8月, 2008 1 次提交
    • S
      GFS2: Fix & clean up GFS2 rename · 0188d6c5
      Steven Whitehouse 提交于
      This patch fixes a locking issue in the rename code by ensuring that we hold
      the per sb rename lock over both directory and "other" renames which involve
      different parent directories.
      
      At the same time, this moved the (only called from one place) function
      gfs2_ok_to_move into the file that its called from, so we can mark it
      static. This should make a code a bit easier to follow.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Peter Staubach <staubach@redhat.com>
      0188d6c5
  4. 27 7月, 2008 1 次提交
  5. 10 7月, 2008 1 次提交
  6. 03 7月, 2008 1 次提交
    • M
      [GFS2] don't call permission() · f58ba889
      Miklos Szeredi 提交于
      GFS2 calls permission() to verify permissions after locks on the files
      have been taken.
      
      For this it's sufficient to call gfs2_permission() instead.  This
      results in the following changes:
      
        - IS_RDONLY() check is not performed
        - IS_IMMUTABLE() check is not performed
        - devcgroup_inode_permission() is not called
        - security_inode_permission() is not called
      
      IS_RDONLY() should be unnecessary anyway, as the per-mount read-only
      flag should provide protection against read-only remounts during
      operations.  do_gfs2_set_flags() has been fixed to perform
      mnt_want_write()/mnt_drop_write() to protect against remounting
      read-only.
      
      IS_IMMUTABLE has been added to gfs2_permission()
      
      Repeating the security checks seems to be pointless, as they don't
      normally change, and if they do, it's independent of the filesystem
      state.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f58ba889
  7. 12 5月, 2008 1 次提交
  8. 10 4月, 2008 1 次提交
  9. 31 3月, 2008 9 次提交
    • C
      [GFS2] possible null pointer dereference fixup · 182fe5ab
      Cyrill Gorcunov 提交于
      gfs2_alloc_get may fail so we have to check it to prevent
      NULL pointer dereference.
      Signed-off-by: NCyrill Gorcunov <gorcunov@gamil.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      182fe5ab
    • D
      [GFS2] re-support special inode · 43a33c53
      Denis Cheng 提交于
      a previous commit removed call to
      init_special_inode from inode lookuping, this cause problems as:
      
       # mknod /mnt/gfs2/dev/null c 1 3
       # cat /mnt/gfs2/dev/null
       cat: /mnt/gfs2/dev/null: Invalid argument
      
      without special inode, GFS2 cannot support char device file,
      block device file, fifo pipe, and socket file, lose many important
      features as a common file system.
      
      this one line patch re add special inode support.
      Signed-off-by: NDenis Cheng <crquan@gmail.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      43a33c53
    • D
      [GFS2] remove gfs2_dev_iops · d83225d4
      Denis Cheng 提交于
      struct inode_operations gfs2_dev_iops is always the same as gfs2_file_iops,
      since Jan 2006, when GFS2 merged into mainstream kernel.
      
      So one of them could be removed.
      Signed-off-by: NDenis Cheng <crquan@gmail.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d83225d4
    • S
      [GFS2] Fix a page lock / glock deadlock · 7afd88d9
      Steven Whitehouse 提交于
      We've previously been using a "try lock" in readpage on the basis that
      it would prevent deadlocks due to the inverted lock ordering (our normal
      lock ordering is glock first and then page lock). Unfortunately tests
      have shown that this isn't enough. If the glock has a demote request
      queued such that run_queue() in the glock code tries to do a demote when
      its called under readpage then it will try and write out all the dirty
      pages which requires locking them. This then deadlocks with the page
      locked by readpage.
      
      The solution is to always require two calls into readpage. The first
      unlocks the page, gets the glock and returns AOP_TRUNCATED_PAGE, the
      second does the actual readpage and unlocks the glock & page as
      required.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7afd88d9
    • S
      [GFS2] Eliminate (almost) duplicate field from gfs2_inode · 77658aad
      Steven Whitehouse 提交于
      The blocks counter is almost a duplicate of the i_blocks
      field in the VFS inode. The only difference is that i_blocks
      can be only 32bits long for 32bit arch without large single file
      support. Since GFS2 doesn't handle the non-large single file
      case (for 32 bit anyway) this adds a new config dependency on
      64BIT || LSF. This has always been the case, however we've never
      explicitly said so before.
      
      Even if we do add support for the non-LSF case, we will still
      not require this field to be duplicated since we will not be
      able to access oversized files anyway.
      
      So the net result of all this is that we shave 8 bytes from a gfs2_inode
      and get our config deps correct.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      77658aad
    • S
      [GFS2] Reduce inode size by merging fields · ce276b06
      Steven Whitehouse 提交于
      There were three fields being used to keep track of the location
      of the most recently allocated block for each inode. These have
      been merged into a single field in order to better keep the
      data and metadata for an inode close on disk, and also to reduce
      the space required for storage.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ce276b06
    • S
      [GFS2] Shrink & rename di_depth · 9a004508
      Steven Whitehouse 提交于
      This patch forms a pair with the previous patch which shrunk
      di_height. Like that patch di_depth is renamed i_depth and moved
      into struct gfs2_inode directly. Also the field goes from 16 bits
      to 8 bits since it is also limited to a max value which is rather
      small (17 in this case). In addition we also now validate the field
      against this maximum value when its read in.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9a004508
    • B
      [GFS2] Fix debug inode printing · ca390601
      Bob Peterson 提交于
      I noticed that the latest change to i_height got rid of the
      value from the inode dump.  This patch adds it back.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ca390601
    • S
      [GFS2] Streamline indirect pointer tree height calculation · ecc30c79
      Steven Whitehouse 提交于
      This patch improves the calculation of the tree height in order to reduce
      the number of operations which are carried out on each call to gfs2_block_map.
      In the common case, we now make a single comparison, rather than calculating
      the required tree height from scratch each time. Also in the case that the
      tree does need some extra height, we start from the current height rather from
      zero when we work out what the new height ought to be.
      
      In addition the di_height field is moved into the inode proper and reduced
      in size to a u8 since the value must be between 0 and GFS2_MAX_META_HEIGHT (10).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ecc30c79
  10. 08 2月, 2008 1 次提交
  11. 25 1月, 2008 7 次提交
    • B
      [GFS2] Lockup on error · 1b8177ec
      Bob Peterson 提交于
      I spotted this bug while I was digging around.  Looks like it could cause
      a lockup in some rare error condition.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1b8177ec
    • S
      [GFS2] Reduce inode size by moving i_alloc out of line · 6dbd8224
      Steven Whitehouse 提交于
      It is possible to reduce the size of GFS2 inodes by taking the i_alloc
      structure out of the gfs2_inode. This patch allocates the i_alloc
      structure whenever its needed, and frees it afterward. This decreases
      the amount of low memory we use at the expense of requiring a memory
      allocation for each page or partial page that we write. A quick test
      with postmark shows that the overhead is not measurable and I also note
      that OCFS2 use the same approach.
      
      In the future I'd like to solve the problem by shrinking down the size
      of the members of the i_alloc structure, but for now, this reduces the
      immediate problem of using too much low-memory on x86 and doesn't add
      too much overhead.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6dbd8224
    • W
      [GFS2] Remove lock methods for lock_nolock protocol · c97bfe43
      Wendy Cheng 提交于
      GFS2 supports two modes of locking - lock_nolock for single node filesystem
      and lock_dlm for cluster mode locking. The gfs2 lock methods are removed from
      file operation table for lock_nolock protocol. This would allow VFS to handle
      posix lock and flock logics just like other in-tree filesystems without
      duplication.
      Signed-off-by: NS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c97bfe43
    • S
      [GFS2] Don't add glocks to the journal · 2bcd610d
      Steven Whitehouse 提交于
      The only reason for adding glocks to the journal was to keep track
      of which locks required a log flush prior to release. We add a
      flag to the glock to allow this check to be made in a simpler way.
      
      This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64)
      and means that we can avoid extra work during the journal flush.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2bcd610d
    • S
      [GFS2] Introduce gfs2_set_aops() · 5561093e
      Steven Whitehouse 提交于
      Just like ext3 we now have three sets of address space operations
      to cover the cases of writeback, ordered and journalled data
      writes. This means that the individual operations can now become
      less complicated as we are able to remove some of the tests for
      file data mode from the code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5561093e
    • S
      [GFS2] Remove useless i_cache from inodes · f91a0d3e
      Steven Whitehouse 提交于
      The i_cache was designed to keep references to the indirect blocks
      used during block mapping so that they didn't have to be looked
      up continually. The idea failed because there are too many places
      where the i_cache needs to be freed, and this has in the past been
      the cause of many bugs.
      
      In addition there was no performance benefit being gained since the
      disk blocks in question were cached anyway. So this patch removes
      it in order to simplify the code to prepare for other changes which
      would otherwise have had to add further support for this feature.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f91a0d3e
    • S
      [GFS2] Clean up internal read function · 51ff87bd
      Steven Whitehouse 提交于
      As requested by Christoph, this patch cleans up GFS2's internal
      read function so that it no longer uses the do_generic_mapping_read
      function. This function is obsolete and GFS2 is the last user of it.
      
      As a side effect the internal read code gets smaller and easier
      to read and gfs2_readpage is split into two. One function has the locking
      and the other function has the rest of the logic.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      51ff87bd
  12. 10 10月, 2007 2 次提交
    • B
      [GFS2] Alternate gfs2_iget to avoid looking up inodes being freed · 7a9f53b3
      Benjamin Marzinski 提交于
      There is a possible deadlock between two processes on the same node, where one
      process is deleting an inode, and another process is looking for allocated but
      unused inodes to delete in order to create more space.
      
      process A does an iput() on inode X, and it's i_count drops to 0. This causes
      iput_final() to be called, which puts an inode into state I_FREEING at
      generic_delete_inode(). There no point between when iput_final() is called, and
      when I_FREEING is set where GFS2 could acquire any glocks. Once I_FREEING is
      set, no other process on that node can successfully look up that inode until
      the delete finishes.
      
      process B locks the the resource group for the same inode in get_local_rgrp(),
      which is called by gfs2_inplace_reserve_i()
      
      process A tries to lock the resource group for the inode in
      gfs2_dinode_dealloc(), but it's already locked by process B
      
      process B waits in find_inode for the inode to have the I_FREEING state cleared.
      
      Deadlock.
      
      This patch solves the problem by adding an alternative to gfs2_iget(),
      gfs2_iget_skip(), that simply skips any inodes that are in the I_FREEING
      state.o The alternate test function is just like the original one, except that
      it fails if the inode is being freed, and sets a skipped flag. The alternate
      set function is just like the original, except that it fails if the skipped
      flag is set. Only try_rgrp_unlink() calls gfs2_iget_skip() instead of
      gfs2_iget().
      Signed-off-by: NBenjamin E. Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7a9f53b3
    • W
      [GFS2] fix inode meta data corruption · e9bd2b3b
      Wendy Cheng 提交于
      Fix a nasty inode meta data corruption issue by keeping the buffer head in
      icache array. This buffer needs to stay in memory until journal flush occurs
      Otherwise, gfs2_meta_inode_buffer could do a disk read before the inode hits
      disk. It ends up with meta data corruptions. The buffer will be released as
      part of the existing journal flush logic.
      Signed-off-by: NS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e9bd2b3b
  13. 09 7月, 2007 9 次提交
    • W
      [GFS2] Remove i_mode passing from NFS File Handle · 35dcc52e
      Wendy Cheng 提交于
      GFS2 has been passing i_mode within NFS File Handle. Other than the
      wrong assumption that there is always room for this extra 16 bit value,
      the current gfs2_get_dentry doesn't really need the i_mode to work
      correctly. Note that GFS2 NFS code does go thru the same lookup code
      path as direct file access route (where the mode is obtained from name
      lookup) but gfs2_get_dentry() is coded for different purpose. It is not
      used during lookup time. It is part of the file access procedure call.
      When the call is invoked, if on-disk inode is not in-memory, it has to
      be read-in. This makes i_mode passing a useless overhead.
      Signed-off-by: NS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      35dcc52e
    • W
      [GFS2] Obtaining no_formal_ino from directory entry · bb9bcf06
      Wendy Cheng 提交于
      GFS2 lookup code doesn't ask for inode shared glock. This implies during
      in-memory inode creation for existing file, GFS2 will not disk-read in
      the inode contents. This leaves no_formal_ino un-initialized during
      lookup time. The un-initialized no_formal_ino is subsequently encoded
      into file handle. Clients will get ESTALE error whenever it tries to
      access these files.
      Signed-off-by: NS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bb9bcf06
    • A
      [GFS2] Fix deallocation issues · d93cfa98
      Abhijith Das 提交于
      There were two issues during deallocation of unlinked inodes. The
      first was relating to the use of a "try" lock which in the case of
      the inode lock wasn't trying hard enough to deallocate in all
      circumstances (now changed to a normal glock) and in the case of
      the iopen lock didn't wait for the demotion of the shared lock before
      attempting to get the exclusive lock, and thereby sometimes (timing dependent)
      not completing the deallocation when it should have done.
      
      The second issue related to the lack of a way to invalidate dcache entries
      on remote nodes (now fixed by this patch) which meant that unlinks were
      taking a long time to return disk space to the fs. By adding some code to
      invalidate the dcache entries across the cluster for unlinked inodes, that
      is now fixed.
      
      This patch was written jointly by Abhijith Das and Steven Whitehouse.
      Signed-off-by: NAbhijith Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d93cfa98
    • A
      [GFS2] gfs2_lookupi() uninitialised var fix · 037bcbb7
      akpm@linux-foundation.org 提交于
      fs/gfs2/inode.c: In function 'gfs2_lookupi':
      fs/gfs2/inode.c:392: warning: 'error' may be used uninitialized in this function
      
      Looks like a real bug to me.
      
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      037bcbb7
    • S
      [GFS2] Recovery for lost unlinked inodes · c8cdf479
      Steven Whitehouse 提交于
      Under certain circumstances its possible (though rather unlikely) that
      inodes which were unlinked by one node while still open on another might
      get "lost" in the sense that they don't get deallocated if the node
      which held the inode open crashed before it was unlinked.
      
      This patch adds the recovery code which allows automatic deallocation of
      the inode if its found during block allocation (the sensible time to
      look for such inodes since we are scanning the rgrp's bitmaps anyway at
      this time, so it adds no overhead to do this).
      
      Since the inode will have had its i_nlink set to zero, all we need to
      trigger recovery is a lookup and an iput(), and the normal deallocation
      code takes care of the rest.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c8cdf479
    • S
      [GFS2] Fix bug in error path of inode · e1cc8603
      Steven Whitehouse 提交于
      This fixes a bug in the ordering of operations in the error path of
      createi. Its not valid to do an iput() when holding the inode's glock
      since the iput() will (in this case) result in delete_inode() being
      called which needs to grab the lock itself. This was causing the
      recursive lock checking code to trigger.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e1cc8603
    • S
      [GFS2] Add nanosecond timestamp feature · 4bd91ba1
      Steven Whitehouse 提交于
      This adds a nanosecond timestamp feature to the GFS2 filesystem. Due
      to the way that the on-disk format works, older filesystems will just
      appear to have this field set to zero. When mounted by an older version
      of GFS2, the filesystem will simply ignore the extra fields so that
      it will again appear to have whole second resolution, so that its
      trivially backward compatible.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4bd91ba1
    • S
      [GFS2] Fix sign problem in quota/statfs and cleanup _host structures · bb8d8a6f
      Steven Whitehouse 提交于
      This patch fixes some sign issues which were accidentally introduced
      into the quota & statfs code during the endianess annotation process.
      Also included is a general clean up which moves all of the _host
      structures out of gfs2_ondisk.h (where they should not have been to
      start with) and into the places where they are actually used (often only
      one place). Also those _host structures which are not required any more
      are removed entirely (which is the eventual plan for all of them).
      
      The conversion routines from ondisk.c are also moved into the places
      where they are actually used, which for almost every one, was just one
      single place, so all those are now static functions. This also cleans up
      the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__.
      
      The net result is a reduction of about 100 lines of code, many functions
      now marked static plus the bug fixes as mentioned above. For good
      measure I ran the code through sparse after making these changes to
      check that there are no warnings generated.
      
      This fixes Red Hat bz #239686
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bb8d8a6f
    • S
      [GFS2] Clean up inode number handling · dbb7cae2
      Steven Whitehouse 提交于
      This patch cleans up the inode number handling code. The main difference
      is that instead of looking up the inodes using a struct gfs2_inum_host
      we now use just the no_addr member of this structure. The tests relating
      to no_formal_ino can then be done by the calling code. This has
      advantages in that we want to do different things in different code
      paths if the no_formal_ino doesn't match. In the NFS patch we want to
      return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
      no_formal_ino doesn't match and thus we can withdraw in this case.
      
      In order to later fix bz #201012, we need to be able to look up an inode
      without knowing no_formal_ino, as the only information that is known to
      us is the on-disk location of the inode in question.
      
      This patch will also help us to fix bz #236099 at a later date by
      cleaning up a lot of the code in that area.
      
      There are no user visible changes as a result of this patch and there
      are no changes to the on-disk format either.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      dbb7cae2
  14. 08 3月, 2007 2 次提交
  15. 06 2月, 2007 2 次提交
    • R
      [GFS2] Fix unlink deadlocks · ddee7608
      Russell Cattelan 提交于
      Move the glock acquisition to outside of the transactions.
      
      Lock odering must be preserved in order to prevent ABBA
      deadlocks. The current gfs2_change_nlink code would tries
      to grab the glock after having started a transaction and thus is holding
      the log lock. This is inconsistent with other code paths in
      gfs that grab the resource group glock prior to staring
      a tranactions.
      
      One problem with this fix is that the resource group
      lock is always grabbed now even if the inode still has
      ref count and can not be marked for unlink.
      Signed-off-by: NRussell Cattelan <cattelan@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ddee7608
    • S
      [GFS2] Fix recursive locking attempt with NFS · d7c103d0
      Steven Whitehouse 提交于
      In certain cases, its possible for NFS to call the lookup code while
      holding the glock (when doing a readdirplus operation) so we need to
      check for that and not try and lock the glock twice. This also fixes a
      typo in a previous NFS related GFS2 patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d7c103d0