1. 24 3月, 2009 8 次提交
    • S
      GFS2: Fix deadlock on journal flush · d8348de0
      Steven Whitehouse 提交于
      This patch fixes a deadlock when the journal is flushed and there
      are dirty inodes other than the one which caused the journal flush.
      Originally the journal flushing code was trying to obtain the
      transaction glock while running the flush code for an inode glock.
      We no longer require the transaction glock at this point in time
      since we know that any attempt to get the transaction glock from
      another node will result in a journal flush. So if we are flushing
      the journal, we can be sure that the transaction lock is still
      cached from when the transaction was started.
      
      By inlining a version of gfs2_trans_begin() (minus the bit which
      gets the transaction glock) we can avoid the deadlock problems
      caused if there is a demote request queued up on the transaction
      glock.
      
      In addition I've also moved the umount rwsem so that it covers
      the glock workqueue, since it all demotions are done by this
      workqueue now. That fixes a bug on umount which I came across
      while fixing the original problem.
      Reported-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d8348de0
    • S
      GFS2: Fix error path ref counting for root inode · e7c8707e
      Steven Whitehouse 提交于
      We were keeping hold of an extra ref to the root inode in one
      of the error paths, that resulted in a hang.
      Reported-by: NNate Straz <nstraz@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Tested-by: NRobert Peterson <rpeterso@redhat.com>
      e7c8707e
    • S
      GFS2: Remove unused field from glock · ac2425e7
      Steven Whitehouse 提交于
      The time stamp field is unused in the glock now that we are
      using a shrinker, so that we can remove it and save sizeof(unsigned long)
      bytes in each glock.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ac2425e7
    • S
      GFS2: Merge lock_dlm module into GFS2 · f057f6cd
      Steven Whitehouse 提交于
      This is the big patch that I've been working on for some time
      now. There are many reasons for wanting to make this change
      such as:
       o Reducing overhead by eliminating duplicated fields between structures
       o Simplifcation of the code (reduces the code size by a fair bit)
       o The locking interface is now the DLM interface itself as proposed
         some time ago.
       o Fewer lookups of glocks when processing replies from the DLM
       o Fewer memory allocations/deallocations for each glock
       o Scope to do further optimisations in the future (but this patch is
         more than big enough for now!)
      
      Please note that (a) this patch relates to the lock_dlm module and
      not the DLM itself, that is still a separate module; and (b) that
      we retain the ability to build GFS2 as a standalone single node
      filesystem with out requiring the DLM.
      
      This patch needs a lot of testing, hence my keeping it I restarted
      my -git tree after the last merge window. That way, this has the maximum
      exposure before its merged. This is (modulo a few minor bug fixes) the
      same patch that I've been posting on and off the the last three months
      and its passed a number of different tests so far.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f057f6cd
    • S
      GFS2: Remove "double" locking in quota · 22077f57
      Steven Whitehouse 提交于
      We only really need a single spin lock for the quota data, so
      lets just use the lru lock for now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Abhijith Das <adas@redhat.com>
      22077f57
    • A
      GFS2: change gfs2_quota_scan into a shrinker · 0a7ab79c
      Abhijith Das 提交于
      Deallocation of gfs2_quota_data objects now happens on-demand through a
      shrinker instead of routinely deallocating through the quotad daemon.
      Signed-off-by: NAbhijith Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      0a7ab79c
    • A
      GFS2: Bring back lvb-related stuff to lock_nolock to support quotas · 2db2aac2
      Abhijith Das 提交于
      The quota code uses lvbs and this is currently not implemented in
      lock_nolock, thereby causing panics when quota is enabled with
      lock_nolock. This patch adds the relevant bits.
      Signed-off-by: NAbhijith Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2db2aac2
    • S
      GFS2: Fix remount argument parsing · 6f04c1c7
      Steven Whitehouse 提交于
      The following patch fixes an issue relating to remount and argument
      parsing. After this fix is applied, remount becomes atomic in that
      it either succeeds changing the mount to the new state, or it fails
      and leaves it in the old state. Previously it was possible for the
      parsing of options to fail part way though and for the fs to be left
      in a state where some of the new arguments had been applied, but some
      had not.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6f04c1c7
  2. 10 1月, 2009 1 次提交
    • T
      filesystem freeze: add error handling of write_super_lockfs/unlockfs · c4be0c1d
      Takashi Sato 提交于
      Currently, ext3 in mainline Linux doesn't have the freeze feature which
      suspends write requests.  So, we cannot take a backup which keeps the
      filesystem's consistency with the storage device's features (snapshot and
      replication) while it is mounted.
      
      In many case, a commercial filesystem (e.g.  VxFS) has the freeze feature
      and it would be used to get the consistent backup.
      
      If Linux's standard filesystem ext3 has the freeze feature, we can do it
      without a commercial filesystem.
      
      So I have implemented the ioctls of the freeze feature.
      I think we can take the consistent backup with the following steps.
      1. Freeze the filesystem with the freeze ioctl.
      2. Separate the replication volume or create the snapshot
         with the storage device's feature.
      3. Unfreeze the filesystem with the unfreeze ioctl.
      4. Take the backup from the separated replication volume
         or the snapshot.
      
      This patch:
      
      VFS:
      Changed the type of write_super_lockfs and unlockfs from "void"
      to "int" so that they can return an error.
      Rename write_super_lockfs and unlockfs of the super block operation
      freeze_fs and unfreeze_fs to avoid a confusion.
      
      ext3, ext4, xfs, gfs2, jfs:
      Changed the type of write_super_lockfs and unlockfs from "void"
      to "int" so that write_super_lockfs returns an error if needed,
      and unlockfs always returns 0.
      
      reiserfs:
      Changed the type of write_super_lockfs and unlockfs from "void"
      to "int" so that they always return 0 (success) to keep a current behavior.
      Signed-off-by: NTakashi Sato <t-sato@yk.jp.nec.com>
      Signed-off-by: NMasayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com>
      Cc: <xfs-masters@oss.sgi.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4be0c1d
  3. 07 1月, 2009 3 次提交
  4. 05 1月, 2009 28 次提交
    • J
      GFS2: Use DEFINE_SPINLOCK · eb8374e7
      Julia Lawall 提交于
      SPIN_LOCK_UNLOCKED is deprecated.  The following makes the change suggested
      in Documentation/spinlocks.txt
      
      The semantic patch that makes this change is as follows:
      (http://www.emn.fr/x-info/coccinelle/)
      
      // <smpl>
      @@
      declarer name DEFINE_SPINLOCK;
      identifier xxx_lock;
      @@
      
      - spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED;
      + DEFINE_SPINLOCK(xxx_lock);
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      eb8374e7
    • S
      GFS2: Fix use-after-free bug on umount (try #2) · 88a19ad0
      Steven Whitehouse 提交于
      This should solve the issue with the previous attempt at fixing this.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      88a19ad0
    • S
      Revert "GFS2: Fix use-after-free bug on umount" · fefc03bf
      Steven Whitehouse 提交于
      This reverts commit 78802499912f1ba31ce83a94c55b5a980f250a43.
      
      The original patch is causing problems in relation to order of
      operations at umount in relation to jdata files. I need to fix
      this a different way.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fefc03bf
    • S
      GFS2: Streamline alloc calculations for writes · 7ed122e4
      Steven Whitehouse 提交于
      This patch removes some unused code, and make the calculation
      of the number of blocks required conditional in order to reduce
      the number of times this (potentially expensive) calculation
      is done.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7ed122e4
    • S
      GFS2: Send useful information with uevent messages · 9a776db7
      Steven Whitehouse 提交于
      In order to distinguish between two differing uevent messages
      and to avoid using the (racy) method of reading status from
      sysfs in future, this adds some status information to our
      uevent messages.
      
      Btw, before anybody says "sysfs isn't racy", I'm aware of that,
      but the way that GFS2 was using it (send an ambiugous uevent and
      then expect the receiver to read sysfs to find out the status
      of the reported operation) was.
      
      The additional benefit of using the new interface is that it
      should be possible for a node to recover multiple journals
      at the same time, since there is no longer any confusion as
      to which journal the status belongs to.
      
      At some future stage, when all the userland programs have been
      converted, I intend to remove the old interface.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9a776db7
    • S
      GFS2: Fix use-after-free bug on umount · 3af165ac
      Steven Whitehouse 提交于
      There was a use-after-free with the GFS2 super block during
      umount. This patch moves almost all of the umount code from
      ->put_super into ->kill_sb, the only bit that cannot be moved
      being the glock hash clearing which has to remain as ->put_super
      due to umount ordering requirements. As a result its now obvious
      that the kfree is the final operation, whereas before it was
      hidden in ->put_super.
      
      Also gfs2_jindex_free is then only referenced from a single file
      so thats moved and marked static too.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3af165ac
    • S
      GFS2: Remove ancient, unused code · 2e204703
      Steven Whitehouse 提交于
      Remove code that used to have something to do with initrd
      but has been unused for a long time.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2e204703
    • S
      GFS2: Move four functions from super.c · 2bfb6449
      Steven Whitehouse 提交于
      The functions which are being moved can all be marked
      static in their new locations, since they only have
      a single caller each. Their new locations are more
      logical than before and some of the functions are
      small enough that the compiler might well inline them.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2bfb6449
    • S
      GFS2: Fix bug in gfs2_lock_fs_check_clean() · b5289681
      Steven Whitehouse 提交于
      gfs2_lock_fs_check_clean() should not be calling gfs2_jindex_hold()
      since it doesn't work like rindex hold, despite the comment. That
      allows gfs2_jindex_hold() to be moved into ops_fstype.c where it
      can be made static.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b5289681
    • S
      GFS2: Send some sensible sysfs stuff · fdd1062e
      Steven Whitehouse 提交于
      We ought to inform the user of the locktable and lockproto for each
      uevent we generate.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fdd1062e
    • S
      GFS2: Kill two daemons with one patch · 97cc1025
      Steven Whitehouse 提交于
      This patch removes the two daemons, gfs2_scand and gfs2_glockd
      and replaces them with a shrinker which is called from the VM.
      
      The net result is that GFS2 responds better when there is memory
      pressure, since it shrinks the glock cache at the same rate
      as the VFS shrinks the dcache and icache. There are no longer
      any time based criteria for shrinking glocks, they are kept
      until such time as the VM asks for more memory and then we
      demote just as many glocks as required.
      
      There are potential future changes to this code, including the
      possibility of sorting the glocks which are to be written back
      into inode number order, to get a better I/O ordering. It would
      be very useful to have an elevator based workqueue implementation
      for this, as that would automatically deal with the read I/O cases
      at the same time.
      
      This patch is my answer to Andrew Morton's remark, made during
      the initial review of GFS2, asking why GFS2 needs so many kernel
      threads, the answer being that it doesn't :-) This patch is a
      net loss of about 200 lines of code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      97cc1025
    • S
      GFS2: Move gfs2_recoverd into recovery.c · 9ac1b4d9
      Steven Whitehouse 提交于
      By moving gfs2_recoverd, we can make an additional function static
      and it also leaves only (the already scheduled for removal) gfs2_glockd
      in daemon.c.
      
      At the same time the declaration of gfs2_quotad is moved to quota.h
      to reflect the new location of gfs2_quotad in a previous patch. Also
      the recovery.h and quota.h headers are cleaned up.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9ac1b4d9
    • S
      GFS2: Fix "truncate in progress" hang · 813e0c46
      Steven Whitehouse 提交于
      Following on from the recent clean up of gfs2_quotad, this patch moves
      the processing of "truncate in progress" inodes from the glock workqueue
      into gfs2_quotad. This fixes a hang due to the "truncate in progress"
      processing requiring glocks in order to complete.
      
      It might seem odd to use gfs2_quotad for this particular item, but
      we have to use a pre-existing thread since creating a thread implies
      a GFP_KERNEL memory allocation which is not allowed from the glock
      workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd
      may deadlock if used for this operation. gfs2_scand and gfs2_glockd are
      both scheduled for removal at some (hopefully not too distant) future
      point. That leaves only gfs2_quotad whose workload is generally fairly
      light and is easily adapted for this extra task.
      
      Also, as a result of this change, it opens the way for a future patch to
      make the reading of the inode's information asynchronous with respect to
      the glock workqueue, which is another improvement that has been on the list
      for some time now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      813e0c46
    • S
      GFS2: Clean up & move gfs2_quotad · 37b2c837
      Steven Whitehouse 提交于
      This patch is a clean up of gfs2_quotad prior to giving it an
      extra job to do in addition to the current portfolio of updating
      the quota and statfs information from time to time.
      
      As a result it has been moved into quota.c allowing one of the
      functions it calls to be made static. Also the clean up allows
      the two existing functions to have separate timeouts and also
      to coexist with its future role of dealing with the "truncate in
      progress" inode flag.
      
      The (pointless) setting of gfs2_quotad_secs is removed since we
      arrange to only wake up quotad when one of the two timers expires.
      
      In addition the struct gfs2_quota_data is moved into a slab cache,
      mainly for easier debugging. It should also be possible to use
      a shrinker in the future, rather than the current scheme of scanning
      the quota data entries from time to time.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      37b2c837
    • S
      GFS2: Add more detail to debugfs glock dumps · fa75cedc
      Steven Whitehouse 提交于
      Although the glock dumps print quite a lot of information about
      the glocks themselves, there are more things which can be
      usefully added to the dump realting to the objects themselves.
      
      This patch adds a few more fields to the inode and resource
      group lines, which should be useful for debugging.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fa75cedc
    • S
      GFS2: Banish struct gfs2_rgrpd_host · 73f74948
      Steven Whitehouse 提交于
      This patch moves the final field so that we can get rid
      of struct gfs2_rgrpd_host, as promised some time ago. Also
      by rearranging the fields slightly, we are able to reduce
      the size of the gfs2_rgrpd structure at the same time.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      73f74948
    • S
      GFS2: Move rg_free from gfs2_rgrpd_host to gfs2_rgrpd · cfc8b549
      Steven Whitehouse 提交于
      The second of three fields which need to move, in order
      to remove the struct gfs2_rgrpd_host.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      cfc8b549
    • S
      GFS2: Move rg_igeneration into struct gfs2_rgrpd · d8b71f73
      Steven Whitehouse 提交于
      This moves one of the fields of struct gfs2_rgrpd_host into
      the struct gfs2_rgrpd with the eventual aim of removing
      the struct rgrpd_host completely.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d8b71f73
    • S
      GFS2: Banish struct gfs2_dinode_host · 383f01fb
      Steven Whitehouse 提交于
      The final field in gfs2_dinode_host was the i_flags field. Thats
      renamed to i_diskflags in order to avoid confusion with the existing
      inode flags, and moved into the inode proper at a suitable location
      to avoid creating a "hole".
      
      At that point struct gfs2_dinode_host is no longer needed and as
      promised (quite some time ago!) it can now be removed completely.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      383f01fb
    • S
      GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize · c9e98886
      Steven Whitehouse 提交于
      This patch moved the i_size field from the gfs2_dinode_host and
      following the ext3 convention renames it i_disksize.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c9e98886
    • S
      GFS2: Move di_eattr into "proper" inode · 3767ac21
      Steven Whitehouse 提交于
      This moves the di_eattr field out of gfs2_inode_host and
      into the inode proper.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3767ac21
    • S
      GFS2: Move "entries" into "proper" inode · ad6203f2
      Steven Whitehouse 提交于
      This moves the directory entry count into the proper inode.
      Potentially we could get this to share the space used by
      something else in the future, but this is one more step
      on the way to removing the gfs2_dinode_host structure.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ad6203f2
    • S
      GFS2: Move generation number into "proper" part of inode · bcf0b5b3
      Steven Whitehouse 提交于
      This moves the generation number from the gfs2_dinode_host
      into the gfs2_inode structure. Eventually the plan is to get
      rid of the gfs2_dinode_host structure completely.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bcf0b5b3
    • H
      GFS2: sparse annotation of gl->gl_spin · 55ba474d
      Harvey Harrison 提交于
      fs/gfs2/glock.c:308:5: warning: context problem in 'do_promote': '_spin_unlock' expected different context
      fs/gfs2/glock.c:308:5:    context '*gl+28': wanted >= 1, got 0
      fs/gfs2/glock.c:529:2: warning: context problem in 'do_xmote': '_spin_unlock' expected different context
      fs/gfs2/glock.c:529:2:    context '*gl+28': wanted >= 1, got 0
      fs/gfs2/glock.c:925:3: warning: context problem in 'add_to_queue': '_spin_unlock' expected different context
      fs/gfs2/glock.c:925:3:    context '*gl+28': wanted >= 1, got 0
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      55ba474d
    • S
      GFS2: Fix up jdata writepage/delete_inode · 1bb7322f
      Steven Whitehouse 提交于
      There is a bug in writepage and delete_inode which allows jdata files to
      invalidate pages from the address space without being in a transaction at
      the time. This causes problems in case the pages are in the journal. This
      patch fixes that case and prevents the resulting oops.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1bb7322f
    • S
      GFS2: Rationalise header files · b2760583
      Steven Whitehouse 提交于
      Move the contents of some headers which contained very
      little into more sensible places, and remove the original
      header files. This should make it easier to find things.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b2760583
    • S
      GFS2: Support for FIEMAP ioctl · e9079cce
      Steven Whitehouse 提交于
      This patch implements the FIEMAP ioctl for GFS2. We can use the generic
      code (aside from a lock order issue, solved as per Ted Tso's suggestion)
      for which I've introduced a new variant of the generic function. We also
      have one exception to deal with, namely stuffed files, so we do that
      "by hand", setting all the required flags.
      
      This has been tested with a modified (I could only find an old version) of
      Eric's test program, and appears to work correctly.
      
      This patch does not currently support FIEMAP of xattrs, but the plan is to add
      that feature at some future point.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Eric Sandeen <sandeen@redhat.com>
      e9079cce
    • N
      fs: symlink write_begin allocation context fix · 54566b2c
      Nick Piggin 提交于
      With the write_begin/write_end aops, page_symlink was broken because it
      could no longer pass a GFP_NOFS type mask into the point where the
      allocations happened.  They are done in write_begin, which would always
      assume that the filesystem can be entered from reclaim.  This bug could
      cause filesystem deadlocks.
      
      The funny thing with having a gfp_t mask there is that it doesn't really
      allow the caller to arbitrarily tinker with the context in which it can be
      called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
      take the page lock.  The only thing any callers care about is __GFP_FS
      anyway, so turn that into a single flag.
      
      Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
      this flag in their write_begin function.  Change __grab_cache_page to
      accept a nofs argument as well, to honour that flag (while we're there,
      change the name to grab_cache_page_write_begin which is more instructive
      and does away with random leading underscores).
      
      This is really a more flexible way to go in the end anyway -- if a
      filesystem happens to want any extra allocations aside from the pagecache
      ones in ints write_begin function, it may now use GFP_KERNEL (rather than
      GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
      random example).
      
      [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
      [kosaki.motohiro@jp.fujitsu.com: fix fuse]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Cleaned up the calling convention: just pass in the AOP flags
        untouched to the grab_cache_page_write_begin() function.  That
        just simplifies everybody, and may even allow future expansion of the
        logic.   - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54566b2c