1. 09 9月, 2009 1 次提交
  2. 27 8月, 2009 1 次提交
    • S
      GFS2: Remove no_formal_ino generating code · 8d8291ae
      Steven Whitehouse 提交于
      The inum structure used throughout GFS2 has two fields. One
      no_addr is the disk block number of the inode in question and
      is used everywhere as the inode number. The other, no_formal_ino,
      is used only as the generation number for NFS.
      
      Historically the no_formal_ino field was set using a complicated
      system of one global and one per-node file containing inode numbers
      in order to ensure that each no_formal_ino was unique. Also this
      code made no provision for what would happen when eventually the
      (64 bit) numbers ran out. Now I know that is pretty unlikely to
      happen given the large space of numbers, but it is possible
      nevertheless.
      
      The only guarantee required for no_formal_ino is that, for any
      single inode, the same number doesn't get reused too quickly.
      
      We already have a generation number which is kept in the inode
      and initialised from a counter in the resource group (almost
      no overhead, since we have to touch the resource group anyway
      in order to allocate an inode in the first place). Aside from
      ensuring that we never use the value 0 in the no_formal_ino
      field, we can use that counter directly.
      
      As a result of that change, we lose about 200 lines of code and
      also gain about 10 creates/sec on the postmark benchmark (on
      my test machine).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8d8291ae
  3. 24 8月, 2009 1 次提交
    • B
      GFS2: Add "-o errors=panic|withdraw" mount options · d34843d0
      Bob Peterson 提交于
      This patch adds "-o errors=panic" and "-o errors=withdraw" to the
      gfs2 mount options.  The "errors=withdraw" option is today's
      current behaviour, meaning to withdraw from the file system if a
      non-serious gfs2 error occurs.  The new "errors=panic" option
      tells gfs2 to force a kernel panic if a non-serious gfs2 file
      system error occurs.  This may be useful, for example, where
      fabric-level fencing is used that has no way to reboot (such as
      fence_scsi).
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d34843d0
  4. 30 7月, 2009 1 次提交
  5. 21 5月, 2009 2 次提交
    • S
      GFS2: Be more aggressive in reclaiming unlinked inodes · 1ce97e56
      Steven Whitehouse 提交于
      This patch increases the frequency with which gfs2 looks
      for unlinked, but still allocated inodes. Its the equivalent
      operation to ext3's orphan list, but done with bitmaps in
      the resource groups.
      
      This also fixes a bug where a field in the rgrp was too small.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1ce97e56
    • S
      GFS2: Add a rgrp bitmap full flag · 60a0b8f9
      Steven Whitehouse 提交于
      During block allocation, it is useful to know if sections of disk
      are full on a finer grained basis than a single resource group.
      This can make a performance difference when resource groups have
      larger numbers of bitmap blocks, since we no longer have to search
      them all block by block in each individual bitmap.
      
      The full flag is set on a per-bitmap basis when it has been
      searched and found to have no free space. It is then skipped in
      subsequent searches until the flag is reset. The resetting
      occurs if we have to drop the glock on the resource group for any
      reason, or if we deallocate some blocks within that resource
      group and thus free up some space.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      60a0b8f9
  6. 20 5月, 2009 1 次提交
    • S
      GFS2: Improve resource group error handling · 09010978
      Steven Whitehouse 提交于
      This patch improves the error handling in the case where we
      discover that the summary information in the resource group
      doesn't match the bitmap information while in the process of
      allocating blocks. Originally this resulted in a kernel bug,
      but this patch changes that so that we return -EIO and print
      some messages explaining what went wrong, and how to fix it.
      
      We also remember locally not to try and allocate from the
      same rgrp again, so that a subsequent allocation in a
      different rgrp should succeed.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      09010978
  7. 19 5月, 2009 1 次提交
    • S
      GFS2: Umount recovery race fix · fe64d517
      Steven Whitehouse 提交于
      This patch fixes a race condition where we can receive recovery
      requests part way through processing a umount. This was causing
      problems since the recovery thread had already gone away.
      
      Looking in more detail at the recovery code, it was really trying
      to implement a slight variation on a work queue, and that happens to
      align nicely with the recently introduced slow-work subsystem. As a
      result I've updated the code to use slow-work, rather than its own home
      grown variety of work queue.
      
      When using the wait_on_bit() function, I noticed that the wait function
      that was supplied as an argument was appearing in the WCHAN field, so
      I've updated the function names in order to produce more meaningful
      output.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fe64d517
  8. 13 5月, 2009 1 次提交
    • S
      GFS2: Add commit= mount option · 48c2b613
      Steven Whitehouse 提交于
      It has always been possible to adjust the gfs2 log commit
      interval, but only from the sysfs interface. This adds a
      mount option, commit=<nn>, which will be familar to ext3
      users.
      
      The sysfs interface continues to be available as well, although
      this might be removed in the future.
      
      Also this patch cleans up some duplicated structures in the GFS2
      sysfs code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      48c2b613
  9. 24 3月, 2009 7 次提交
    • B
      GFS2: Fix locking bug in failed shared to exclusive conversion · 02ffad08
      Benjamin Marzinski 提交于
      After calling out to the dlm, GFS2 sets the new state of a glock to
      gl_target in gdlm_ast().  However, gl_target is not always the lock
      state that was requested. If a conversion from shared to exclusive
      fails, finish_xmote() will call do_xmote() with LM_ST_UNLOCKED, instead
      of gl->gl_target, so that it can reacquire the lock in exlusive the next
      time around.  In this case, setting the lock to gl_target in gdlm_ast()
      will make GFS2 think that it has the glock in exclusive mode, when
      really, it doesn't have the glock locked at all.  This patch adds a new
      field to the gfs2_glock structure, gl_req, to track the mode that was
      requested.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      02ffad08
    • S
      GFS2: Expose UUID via sysfs/uevent · 02e3cc70
      Steven Whitehouse 提交于
      Since we have a UUID, we ought to expose it to the user via sysfs
      and uevents. We already have the fs name in both of these places
      (a combination of the lock proto and lock table name) so if we add
      the UUID as well, we have a full set.
      
      For older filesystems (i.e. those created before mkfs.gfs2 was writing
      UUIDs by default) the sysfs file will appear zero length, and no UUID
      env var will be added to the uevents.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      02e3cc70
    • S
      GFS2: Support generation of discard requests · f15ab561
      Steven Whitehouse 提交于
      This patch allows GFS2 to generate discard requests for blocks which are
      no longer useful to the filesystem (i.e. those which have been freed as
      the result of an unlink operation). The requests are generated at the
      time which those blocks become available for reuse in the filesystem.
      
      In order to use this new feature, you have to specify the "discard"
      mount option. The code coalesces adjacent blocks into a single extent
      when generating the discard requests, thus generating the minimum
      number.
      
      If an error occurs when the request has been sent to the block device,
      then it will print a message and turn off the requests for that
      filesystem. If the problem is temporary, then you can use remount to
      turn the option back on again. There is also a nodiscard mount option
      so that you can use remount to turn discard requests off, if required.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f15ab561
    • S
      GFS2: Remove unused field from glock · ac2425e7
      Steven Whitehouse 提交于
      The time stamp field is unused in the glock now that we are
      using a shrinker, so that we can remove it and save sizeof(unsigned long)
      bytes in each glock.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ac2425e7
    • S
      GFS2: Merge lock_dlm module into GFS2 · f057f6cd
      Steven Whitehouse 提交于
      This is the big patch that I've been working on for some time
      now. There are many reasons for wanting to make this change
      such as:
       o Reducing overhead by eliminating duplicated fields between structures
       o Simplifcation of the code (reduces the code size by a fair bit)
       o The locking interface is now the DLM interface itself as proposed
         some time ago.
       o Fewer lookups of glocks when processing replies from the DLM
       o Fewer memory allocations/deallocations for each glock
       o Scope to do further optimisations in the future (but this patch is
         more than big enough for now!)
      
      Please note that (a) this patch relates to the lock_dlm module and
      not the DLM itself, that is still a separate module; and (b) that
      we retain the ability to build GFS2 as a standalone single node
      filesystem with out requiring the DLM.
      
      This patch needs a lot of testing, hence my keeping it I restarted
      my -git tree after the last merge window. That way, this has the maximum
      exposure before its merged. This is (modulo a few minor bug fixes) the
      same patch that I've been posting on and off the the last three months
      and its passed a number of different tests so far.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f057f6cd
    • S
      GFS2: Remove "double" locking in quota · 22077f57
      Steven Whitehouse 提交于
      We only really need a single spin lock for the quota data, so
      lets just use the lru lock for now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Abhijith Das <adas@redhat.com>
      22077f57
    • A
      GFS2: change gfs2_quota_scan into a shrinker · 0a7ab79c
      Abhijith Das 提交于
      Deallocation of gfs2_quota_data objects now happens on-demand through a
      shrinker instead of routinely deallocating through the quotad daemon.
      Signed-off-by: NAbhijith Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      0a7ab79c
  10. 05 1月, 2009 11 次提交
    • S
      GFS2: Kill two daemons with one patch · 97cc1025
      Steven Whitehouse 提交于
      This patch removes the two daemons, gfs2_scand and gfs2_glockd
      and replaces them with a shrinker which is called from the VM.
      
      The net result is that GFS2 responds better when there is memory
      pressure, since it shrinks the glock cache at the same rate
      as the VFS shrinks the dcache and icache. There are no longer
      any time based criteria for shrinking glocks, they are kept
      until such time as the VM asks for more memory and then we
      demote just as many glocks as required.
      
      There are potential future changes to this code, including the
      possibility of sorting the glocks which are to be written back
      into inode number order, to get a better I/O ordering. It would
      be very useful to have an elevator based workqueue implementation
      for this, as that would automatically deal with the read I/O cases
      at the same time.
      
      This patch is my answer to Andrew Morton's remark, made during
      the initial review of GFS2, asking why GFS2 needs so many kernel
      threads, the answer being that it doesn't :-) This patch is a
      net loss of about 200 lines of code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      97cc1025
    • S
      GFS2: Fix "truncate in progress" hang · 813e0c46
      Steven Whitehouse 提交于
      Following on from the recent clean up of gfs2_quotad, this patch moves
      the processing of "truncate in progress" inodes from the glock workqueue
      into gfs2_quotad. This fixes a hang due to the "truncate in progress"
      processing requiring glocks in order to complete.
      
      It might seem odd to use gfs2_quotad for this particular item, but
      we have to use a pre-existing thread since creating a thread implies
      a GFP_KERNEL memory allocation which is not allowed from the glock
      workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd
      may deadlock if used for this operation. gfs2_scand and gfs2_glockd are
      both scheduled for removal at some (hopefully not too distant) future
      point. That leaves only gfs2_quotad whose workload is generally fairly
      light and is easily adapted for this extra task.
      
      Also, as a result of this change, it opens the way for a future patch to
      make the reading of the inode's information asynchronous with respect to
      the glock workqueue, which is another improvement that has been on the list
      for some time now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      813e0c46
    • S
      GFS2: Clean up & move gfs2_quotad · 37b2c837
      Steven Whitehouse 提交于
      This patch is a clean up of gfs2_quotad prior to giving it an
      extra job to do in addition to the current portfolio of updating
      the quota and statfs information from time to time.
      
      As a result it has been moved into quota.c allowing one of the
      functions it calls to be made static. Also the clean up allows
      the two existing functions to have separate timeouts and also
      to coexist with its future role of dealing with the "truncate in
      progress" inode flag.
      
      The (pointless) setting of gfs2_quotad_secs is removed since we
      arrange to only wake up quotad when one of the two timers expires.
      
      In addition the struct gfs2_quota_data is moved into a slab cache,
      mainly for easier debugging. It should also be possible to use
      a shrinker in the future, rather than the current scheme of scanning
      the quota data entries from time to time.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      37b2c837
    • S
      GFS2: Banish struct gfs2_rgrpd_host · 73f74948
      Steven Whitehouse 提交于
      This patch moves the final field so that we can get rid
      of struct gfs2_rgrpd_host, as promised some time ago. Also
      by rearranging the fields slightly, we are able to reduce
      the size of the gfs2_rgrpd structure at the same time.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      73f74948
    • S
      GFS2: Move rg_free from gfs2_rgrpd_host to gfs2_rgrpd · cfc8b549
      Steven Whitehouse 提交于
      The second of three fields which need to move, in order
      to remove the struct gfs2_rgrpd_host.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      cfc8b549
    • S
      GFS2: Move rg_igeneration into struct gfs2_rgrpd · d8b71f73
      Steven Whitehouse 提交于
      This moves one of the fields of struct gfs2_rgrpd_host into
      the struct gfs2_rgrpd with the eventual aim of removing
      the struct rgrpd_host completely.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d8b71f73
    • S
      GFS2: Banish struct gfs2_dinode_host · 383f01fb
      Steven Whitehouse 提交于
      The final field in gfs2_dinode_host was the i_flags field. Thats
      renamed to i_diskflags in order to avoid confusion with the existing
      inode flags, and moved into the inode proper at a suitable location
      to avoid creating a "hole".
      
      At that point struct gfs2_dinode_host is no longer needed and as
      promised (quite some time ago!) it can now be removed completely.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      383f01fb
    • S
      GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize · c9e98886
      Steven Whitehouse 提交于
      This patch moved the i_size field from the gfs2_dinode_host and
      following the ext3 convention renames it i_disksize.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c9e98886
    • S
      GFS2: Move di_eattr into "proper" inode · 3767ac21
      Steven Whitehouse 提交于
      This moves the di_eattr field out of gfs2_inode_host and
      into the inode proper.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3767ac21
    • S
      GFS2: Move "entries" into "proper" inode · ad6203f2
      Steven Whitehouse 提交于
      This moves the directory entry count into the proper inode.
      Potentially we could get this to share the space used by
      something else in the future, but this is one more step
      on the way to removing the gfs2_dinode_host structure.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ad6203f2
    • S
      GFS2: Move generation number into "proper" part of inode · bcf0b5b3
      Steven Whitehouse 提交于
      This moves the generation number from the gfs2_dinode_host
      into the gfs2_inode structure. Eventually the plan is to get
      rid of the gfs2_dinode_host structure completely.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bcf0b5b3
  11. 26 9月, 2008 1 次提交
    • S
      GFS2: Support for I/O barriers · 254db57f
      Steven Whitehouse 提交于
      This patch adds barrier support to GFS2. There is not a lot of change
      really... we just add the barrier flag when we write journal header
      blocks. If the underlying device refuses to support them, we fall back
      to the previous way of doing things (wait for the I/O and hope) since
      there is nothing else we can do. There is no user configuration,
      barriers will always be on unless the device refuses to support them.
      This seems a reasonable solution to me since this is a correctness
      issue.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      254db57f
  12. 18 9月, 2008 2 次提交
    • S
      GFS2: high time to take some time over atime · 719ee344
      Steven Whitehouse 提交于
      Until now, we've used the same scheme as GFS1 for atime. This has failed
      since atime is a per vfsmnt flag, not a per fs flag and as such the
      "noatime" flag was not getting passed down to the filesystems. This
      patch removes all the "special casing" around atime updates and we
      simply use the VFS's atime code.
      
      The net result is that GFS2 will now support all the same atime related
      mount options of any other filesystem on a per-vfsmnt basis. We do lose
      the "lazy atime" updates, but we gain "relatime". We could add lazy
      atime to the VFS at a later date, if there is a requirement for that
      variant still - I suspect relatime will be enough.
      
      Also we lose about 100 lines of code after this patch has been applied,
      and I have a suspicion that it will speed things up a bit, even when
      atime is "on". So it seems like a nice clean up as well.
      
      From a user perspective, everything stays the same except the loss of
      the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
      least, and to be honest I don't think anybody ever used it) and that a
      number of options which were ignored before now work correctly.
      
      Please let me know if you've got any comments. I'm pushing this out
      early so that you can all see what my plans are.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      719ee344
    • S
      GFS2: The war on bloat · 37ec89e8
      Steven Whitehouse 提交于
      The following patch shrinks the gfs2_args structure which is embedded in
      every GFS2 superblock. It cuts down the size of the options to a single
      unsigned int (the 13 bits of bitfields will be rounded up to that size
      by the compiler) from the current 11 unsigned ints. So on x86 thats 44
      bytes shrinking to 4 bytes, in each and every GFS2 superblock.
      Signed-off-by: NSteven Whitehouse <swhitho@redhat.com>
      37ec89e8
  13. 13 8月, 2008 1 次提交
    • S
      GFS2: Fix metafs mounts · 9b8df98f
      Steven Whitehouse 提交于
      This patch is intended to fix the issues reported in bz #457798. Instead
      of having the metafs as a separate filesystem, it becomes a second root
      of gfs2. As a result it will appear as type gfs2 in /proc/mounts, but it
      is still possible (for backwards compatibility purposes) to mount it as
      type gfs2meta. A new mount flag "meta" is introduced so that its possible
      to tell the two cases apart in /proc/mounts.
      
      As a result it becomes possible to mount type gfs2 with -o meta and
      get the same result as mounting type gfs2meta. So it is possible to
      mount just the metafs on its own. Currently if you do this, its then
      impossible to mount the "normal" root of the gfs2 filesystem without
      first unmounting the metafs root. I'm not sure if thats a feature or
      a bug :-)
      
      Either way, this is a great improvement on the previous scheme and I've
      verified that it works ok with bind mounts on both the "normal" root
      and the metafs root in various combinations.
      
      There were also a bunch of functions in super.c which didn't belong there,
      so this moves them into ops_fstype.c where they can be static. Hopefully
      the mount/umount sequence is now more obvious as a result.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Alexander Viro <aviro@redhat.com>
      9b8df98f
  14. 10 7月, 2008 2 次提交
  15. 27 6月, 2008 1 次提交
    • S
      [GFS2] Clean up the glock core · 6802e340
      Steven Whitehouse 提交于
      This patch implements a number of cleanups to the core of the
      GFS2 glock code. As a result a lot of code is removed. It looks
      like a really big change, but actually a large part of this patch
      is either removing or moving existing code.
      
      There are some new bits too though, such as the new run_queue()
      function which is considerably streamlined. Highlights of this
      patch include:
      
       o Fixes a cluster coherency bug during SH -> EX lock conversions
       o Removes the "glmutex" code in favour of a single bit lock
       o Removes the ->go_xmote_bh() for inodes since it was duplicating
         ->go_lock()
       o We now only use the ->lm_lock() function for both locks and
         unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
       o The fast path is considerably shortly, giving performance gains
         especially with lock_nolock
       o The glock_workqueue is now used for all the callbacks from the DLM
         which allows us to simplify the lock_dlm module (see following patch)
       o The way is now open to make further changes such as eliminating the two
         threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
         scheme.
      
      This patch has undergone extensive testing with various test suites
      so it should be pretty stable by now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      6802e340
  16. 12 5月, 2008 1 次提交
  17. 31 3月, 2008 5 次提交
    • B
      [GFS2] Invalidate cache at correct point · 58e9fee1
      Benjamin Marzinski 提交于
      GFS2 wasn't invalidating its cache before it called into the lock manager
      with a request that could potentially drop a lock.  This was leaving a
      window where the lock could be actually be held by another node, but the
      file's page cache would still appear valid, causing coherency problems.
      This patch moves the cache invalidation to before the lock manager call
      when dropping a lock. It also adds the option to the lock_dlm lock
      manager to not use conversion mode deadlock avoidance, which, on a
      conversion from shared to exclusive, could internally drop the lock, and
      then reacquire in. GFS2 now asks lock_dlm to not do this.  Instead, GFS2
      manually drops the lock and reacquires it.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      58e9fee1
    • S
      [GFS2] Eliminate (almost) duplicate field from gfs2_inode · 77658aad
      Steven Whitehouse 提交于
      The blocks counter is almost a duplicate of the i_blocks
      field in the VFS inode. The only difference is that i_blocks
      can be only 32bits long for 32bit arch without large single file
      support. Since GFS2 doesn't handle the non-large single file
      case (for 32 bit anyway) this adds a new config dependency on
      64BIT || LSF. This has always been the case, however we've never
      explicitly said so before.
      
      Even if we do add support for the non-LSF case, we will still
      not require this field to be duplicated since we will not be
      able to access oversized files anyway.
      
      So the net result of all this is that we shave 8 bytes from a gfs2_inode
      and get our config deps correct.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      77658aad
    • S
      [GFS2] Merge the rd_last_alloc_meta and rd_last_alloc_data fields · ac576cc5
      Steven Whitehouse 提交于
      We don't need to keep track of when we last allocated data
      and metadata separately since the only thing thats important
      when searching for a free block is whether its free or not,
      which is independent from what type of block it is.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ac576cc5
    • S
      [GFS2] Reduce inode size by merging fields · ce276b06
      Steven Whitehouse 提交于
      There were three fields being used to keep track of the location
      of the most recently allocated block for each inode. These have
      been merged into a single field in order to better keep the
      data and metadata for an inode close on disk, and also to reduce
      the space required for storage.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ce276b06
    • B
      [GFS2] Remove unused counters · 9feb7c88
      Bob Peterson 提交于
      This is kind of trivial in the greater scheme of things, but
      this removes three counters that AFAICT are never used.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9feb7c88