1. 05 1月, 2009 3 次提交
    • S
      GFS2: Rationalise header files · b2760583
      Steven Whitehouse 提交于
      Move the contents of some headers which contained very
      little into more sensible places, and remove the original
      header files. This should make it easier to find things.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b2760583
    • S
      GFS2: Support for FIEMAP ioctl · e9079cce
      Steven Whitehouse 提交于
      This patch implements the FIEMAP ioctl for GFS2. We can use the generic
      code (aside from a lock order issue, solved as per Ted Tso's suggestion)
      for which I've introduced a new variant of the generic function. We also
      have one exception to deal with, namely stuffed files, so we do that
      "by hand", setting all the required flags.
      
      This has been tested with a modified (I could only find an old version) of
      Eric's test program, and appears to work correctly.
      
      This patch does not currently support FIEMAP of xattrs, but the plan is to add
      that feature at some future point.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Eric Sandeen <sandeen@redhat.com>
      e9079cce
    • N
      fs: symlink write_begin allocation context fix · 54566b2c
      Nick Piggin 提交于
      With the write_begin/write_end aops, page_symlink was broken because it
      could no longer pass a GFP_NOFS type mask into the point where the
      allocations happened.  They are done in write_begin, which would always
      assume that the filesystem can be entered from reclaim.  This bug could
      cause filesystem deadlocks.
      
      The funny thing with having a gfp_t mask there is that it doesn't really
      allow the caller to arbitrarily tinker with the context in which it can be
      called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
      take the page lock.  The only thing any callers care about is __GFP_FS
      anyway, so turn that into a single flag.
      
      Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
      this flag in their write_begin function.  Change __grab_cache_page to
      accept a nofs argument as well, to honour that flag (while we're there,
      change the name to grab_cache_page_write_begin which is more instructive
      and does away with random leading underscores).
      
      This is really a more flexible way to go in the end anyway -- if a
      filesystem happens to want any extra allocations aside from the pagecache
      ones in ints write_begin function, it may now use GFP_KERNEL (rather than
      GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
      random example).
      
      [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
      [kosaki.motohiro@jp.fujitsu.com: fix fuse]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Cleaned up the calling convention: just pass in the AOP flags
        untouched to the grab_cache_page_write_begin() function.  That
        just simplifies everybody, and may even allow future expansion of the
        logic.   - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54566b2c
  2. 14 11月, 2008 1 次提交
  3. 23 10月, 2008 2 次提交
  4. 14 10月, 2008 1 次提交
  5. 26 9月, 2008 1 次提交
    • S
      GFS2: Support for I/O barriers · 254db57f
      Steven Whitehouse 提交于
      This patch adds barrier support to GFS2. There is not a lot of change
      really... we just add the barrier flag when we write journal header
      blocks. If the underlying device refuses to support them, we fall back
      to the previous way of doing things (wait for the I/O and hope) since
      there is nothing else we can do. There is no user configuration,
      barriers will always be on unless the device refuses to support them.
      This seems a reasonable solution to me since this is a correctness
      issue.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      254db57f
  6. 18 9月, 2008 2 次提交
    • S
      GFS2: high time to take some time over atime · 719ee344
      Steven Whitehouse 提交于
      Until now, we've used the same scheme as GFS1 for atime. This has failed
      since atime is a per vfsmnt flag, not a per fs flag and as such the
      "noatime" flag was not getting passed down to the filesystems. This
      patch removes all the "special casing" around atime updates and we
      simply use the VFS's atime code.
      
      The net result is that GFS2 will now support all the same atime related
      mount options of any other filesystem on a per-vfsmnt basis. We do lose
      the "lazy atime" updates, but we gain "relatime". We could add lazy
      atime to the VFS at a later date, if there is a requirement for that
      variant still - I suspect relatime will be enough.
      
      Also we lose about 100 lines of code after this patch has been applied,
      and I have a suspicion that it will speed things up a bit, even when
      atime is "on". So it seems like a nice clean up as well.
      
      From a user perspective, everything stays the same except the loss of
      the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
      least, and to be honest I don't think anybody ever used it) and that a
      number of options which were ignored before now work correctly.
      
      Please let me know if you've got any comments. I'm pushing this out
      early so that you can all see what my plans are.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      719ee344
    • S
      GFS2: The war on bloat · 37ec89e8
      Steven Whitehouse 提交于
      The following patch shrinks the gfs2_args structure which is embedded in
      every GFS2 superblock. It cuts down the size of the options to a single
      unsigned int (the 13 bits of bitfields will be rounded up to that size
      by the compiler) from the current 11 unsigned ints. So on x86 thats 44
      bytes shrinking to 4 bytes, in each and every GFS2 superblock.
      Signed-off-by: NSteven Whitehouse <swhitho@redhat.com>
      37ec89e8
  7. 15 9月, 2008 2 次提交
  8. 05 9月, 2008 2 次提交
    • J
      GFS2: Use an IS_ERR test rather than a NULL test · bd1eb881
      Julien Brunel 提交于
      In case of error, the function gfs2_inode_lookup returns an
      ERR pointer, but never returns a NULL pointer. So a NULL test that
      necessarily comes after an IS_ERR test should be deleted, and a NULL
      test that may come after a call to this function should be
      strengthened by an IS_ERR test.
      
      The semantic match that finds this problem is as follows:
      (http://www.emn.fr/x-info/coccinelle/)
      
      // <smpl>
      @match_bad_null_test@
      expression x, E;
      statement S1,S2;
      @@
      x = gfs2_inode_lookup(...)
      ... when != x = E
      * if (x != NULL)
      S1 else S2
      // </smpl>
      Signed-off-by: NJulien Brunel <brunel@diku.dk>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bd1eb881
    • S
      GFS2: Fix race relating to glock min-hold time · dff52574
      Steven Whitehouse 提交于
      In the case that a request for a glock arrives right after the
      grant reply has arrived, it sometimes means that the gl_tstamp
      field hasn't been updated recently enough. The net result is that
      the min-hold time for the glock is ignored. If this happens
      often enough, it leads to poor performance.
      
      This patch adds an additional test, so that if the reply pending
      bit is set on a glock, then it will select the maximum length of
      time for the min-hold time, rather than looking at gl_tstamp.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      dff52574
  9. 29 8月, 2008 1 次提交
    • D
      dlm: allow multiple lockspace creates · 0f8e0d9a
      David Teigland 提交于
      Add a count for lockspace create and release so that create can
      be called multiple times to use the lockspace from different places.
      Also add the new flag DLM_LSFL_NEWEXCL to create a lockspace with
      the previous behavior of returning -EEXIST if the lockspace already
      exists.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      0f8e0d9a
  10. 27 8月, 2008 1 次提交
    • S
      GFS2: Fix & clean up GFS2 rename · 0188d6c5
      Steven Whitehouse 提交于
      This patch fixes a locking issue in the rename code by ensuring that we hold
      the per sb rename lock over both directory and "other" renames which involve
      different parent directories.
      
      At the same time, this moved the (only called from one place) function
      gfs2_ok_to_move into the file that its called from, so we can mark it
      static. This should make a code a bit easier to follow.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Peter Staubach <staubach@redhat.com>
      0188d6c5
  11. 13 8月, 2008 3 次提交
    • B
      GFS2: rm on multiple nodes causes panic · 72dbf479
      Bob Peterson 提交于
      This patch fixes a problem whereby simultaneous unlink, rmdir,
      rename and link operations (e.g. rm -fR *) from multiple nodes
      on the same GFS2 file system can cause kernel panics, hangs,
      and/or memory corruption.  It also gets rid of all the non-rgrp
      calls to gfs2_glock_nq_m.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      72dbf479
    • S
      GFS2: Fix metafs mounts · 9b8df98f
      Steven Whitehouse 提交于
      This patch is intended to fix the issues reported in bz #457798. Instead
      of having the metafs as a separate filesystem, it becomes a second root
      of gfs2. As a result it will appear as type gfs2 in /proc/mounts, but it
      is still possible (for backwards compatibility purposes) to mount it as
      type gfs2meta. A new mount flag "meta" is introduced so that its possible
      to tell the two cases apart in /proc/mounts.
      
      As a result it becomes possible to mount type gfs2 with -o meta and
      get the same result as mounting type gfs2meta. So it is possible to
      mount just the metafs on its own. Currently if you do this, its then
      impossible to mount the "normal" root of the gfs2 filesystem without
      first unmounting the metafs root. I'm not sure if thats a feature or
      a bug :-)
      
      Either way, this is a great improvement on the previous scheme and I've
      verified that it works ok with bind mounts on both the "normal" root
      and the metafs root in various combinations.
      
      There were also a bunch of functions in super.c which didn't belong there,
      so this moves them into ops_fstype.c where they can be static. Hopefully
      the mount/umount sequence is now more obvious as a result.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Alexander Viro <aviro@redhat.com>
      9b8df98f
    • S
      GFS2: Fix debugfs glock file iterator · c1e817d0
      Steven Whitehouse 提交于
      Due to an incorrect iterator, some glocks were being missed from the
      glock dumps obtained via debugfs. This patch fixes the problem and
      ensures that we don't miss any glocks in future.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c1e817d0
  12. 27 7月, 2008 3 次提交
  13. 11 7月, 2008 1 次提交
  14. 10 7月, 2008 3 次提交
  15. 07 7月, 2008 2 次提交
    • S
      [GFS2] Allow local DF locks when holding a cached EX glock · 209806ab
      Steven Whitehouse 提交于
      We already allow local SH locks while we hold a cached EX glock, so here
      we allow DF locks as well. This works only because we rely on the VFS's
      invalidation for locally cached data, and because if we hold an EX lock,
      then we know that no other node can be caching data relating to this
      file.
      
      It dramatically speeds up initial writes to O_DIRECT files since we fall
      back to buffered I/O for this and would otherwise bounce between DF and
      EX modes on each and every write call. The lessons to be learned from
      that are to ensure that (for the time being anyway) O_DIRECT files are
      preallocated and that they are written to using reasonably large I/O
      sizes. Even so this change fixes that corner case nicely
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      209806ab
    • S
      [GFS2] Fix delayed demote race · 265d529c
      Steven Whitehouse 提交于
      There is a race in the delayed demote code where it does the wrong thing
      if a demotion to UN has occurred for other reasons before the delay has
      expired. This patch adds an assert to catch that condition as well as
      fixing the root cause by adding an additional check for the UN state.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      265d529c
  16. 03 7月, 2008 2 次提交
    • M
      [GFS2] don't call permission() · f58ba889
      Miklos Szeredi 提交于
      GFS2 calls permission() to verify permissions after locks on the files
      have been taken.
      
      For this it's sufficient to call gfs2_permission() instead.  This
      results in the following changes:
      
        - IS_RDONLY() check is not performed
        - IS_IMMUTABLE() check is not performed
        - devcgroup_inode_permission() is not called
        - security_inode_permission() is not called
      
      IS_RDONLY() should be unnecessary anyway, as the per-mount read-only
      flag should provide protection against read-only remounts during
      operations.  do_gfs2_set_flags() has been fixed to perform
      mnt_want_write()/mnt_drop_write() to protect against remounting
      read-only.
      
      IS_IMMUTABLE has been added to gfs2_permission()
      
      Repeating the security checks seems to be pointless, as they don't
      normally change, and if they do, it's independent of the filesystem
      state.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f58ba889
    • A
      Remove BKL from remote_llseek v2 · 9465efc9
      Andi Kleen 提交于
      - Replace remote_llseek with generic_file_llseek_unlocked (to force compilation
      failures in all users)
      - Change all users to either use generic_file_llseek_unlocked directly or
      take the BKL around. I changed the file systems who don't use the BKL
      for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
      take the BKL, but explicitely in their own source now.
      
      I moved them all over in a single patch to avoid unbisectable sections.
      
      Open problem: 32bit kernels can corrupt fpos because its modification
      is not atomic, but they can do that anyways because there's other paths who
      modify it without BKL.
      
      Do we need a special lock for the pos/f_version = 0 checks?
      
      Trond says the NFS BKL is likely not needed, but keep it for now
      until his full audit.
      
      v2: Use generic_file_llseek_unlocked instead of remote_llseek_unlocked
          and factor duplicated code (suggested by hch)
      
      Cc: Trond.Myklebust@netapp.com
      Cc: swhiteho@redhat.com
      Cc: sfrench@samba.org
      Cc: vandrove@vc.cvut.cz
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      9465efc9
  17. 27 6月, 2008 10 次提交
    • S
      [GFS2] Fix module building · f17172e0
      Steven Whitehouse 提交于
      Two lines missed from the previous patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f17172e0
    • S
      [GFS2] Remove all_list from lock_dlm · 31fcba00
      Steven Whitehouse 提交于
      I discovered that we had a list onto which every lock_dlm
      lock was being put. Its only function was to discover whether
      we'd got any locks left after umount. Since there was already
      a counter for that purpose as well, I removed the list. The
      saving is sizeof(struct list_head) per glock - well worth
      having.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      31fcba00
    • S
      [GFS2] Remove obsolete conversion deadlock avoidance code · b2cad26c
      Steven Whitehouse 提交于
      This is only used by GFS1 so can be removed.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b2cad26c
    • S
      [GFS2] Remove remote lock dropping code · 1bdad606
      Steven Whitehouse 提交于
      There are several reasons why this is undesirable:
      
       1. It never happens during normal operation anyway
       2. If it does happen it causes performance to be very, very poor
       3. It isn't likely to solve the original problem (memory shortage
          on remote DLM node) it was supposed to solve
       4. It uses a bunch of arbitrary constants which are unlikely to be
          correct for any particular situation and for which the tuning seems
          to be a black art.
       5. In an N node cluster, only 1/N of the dropped locked will actually
          contribute to solving the problem on average.
      
      So all in all we are better off without it. This also makes merging
      the lock_dlm module into GFS2 a bit easier.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1bdad606
    • B
      [GFS2] kernel panic mounting volume · 9171f5a9
      Bob Peterson 提交于
      This patch fixes Red Hat bugzilla bug 450156.
      
      This started with a not-too-improbable mount failure because the
      locking protocol was never set back to its proper "lock_dlm" after the
      system was rebooted in the middle of a gfs2_fsck.  That left a
      (purposely) invalid locking protocol in the superblock, which caused an
      error when the file system was mounted the next time.
      
      When there's an error mounting, vfs calls DQUOT_OFF, which calls
      vfs_quota_off which calls gfs2_sync_fs.  Next, gfs2_sync_fs calls
      gfs2_log_flush passing s_fs_info.  But due to the error, s_fs_info
      had been previously set to NULL, and so we have the kernel oops.
      
      My solution in this patch is to test for the NULL value before passing
      it.  I tested this patch and it fixes the problem.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9171f5a9
    • S
      [GFS2] Revise readpage locking · 01b7c7ae
      Steven Whitehouse 提交于
      The previous attempt to fix the locking in readpage failed due
      to the use of a "try lock" which resulted in occasional high
      cpu usage during testing (due to repeated tries) and also it
      did not resolve all the ordering problems wrt the transaction
      lock (although it did solve all the inode lock ordering problems).
      
      This patch avoids the problem by unlocking the page and getting the
      locks in the correct order. This means that we have to retest the
      page to ensure that it hasn't changed when we relock the page.
      
      This now passes the tests which were previously failing.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      01b7c7ae
    • S
      [GFS2] Fix ordering of args for list_add · 80274737
      Steven Whitehouse 提交于
      The patch to remove lock_nolock managed to get the arguments
      of this list_add backwards. This fixes it.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      80274737
    • H
      [GFS2] trivial sparse lock annotations · 2d81afb8
      Harvey Harrison 提交于
      Annotate the &sdp->sd_log_lock.
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2d81afb8
    • S
      [GFS2] No lock_nolock · 048bca22
      Steven Whitehouse 提交于
      This patch merges the lock_nolock module into GFS2 itself. As well as removing
      some of the overhead of the module, it also means that its now impossible to
      build GFS2 without a lock module (which would be a pointless thing to do
      anyway).
      
      We also plan to merge lock_dlm into GFS2 in the future, but that is a more
      tricky task, and will therefore be a separate patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: David Teigland <teigland@redhat.com>
      048bca22
    • S
      [GFS2] Fix ordering bug in lock_dlm · f3c9d38a
      Steven Whitehouse 提交于
      This looks like a lot of change, but in fact its not. Mostly its
      things moving from one file to another. The change is just that
      instead of queuing lock completions and callbacks from the DLM
      we now pass them directly to GFS2.
      
      This gives us a net loss of two list heads per glock (a fair
      saving in memory) plus a reduction in the latency of delivering
      the messages to GFS2, plus we now have one thread fewer as well.
      There was a bug where callbacks and completions could be delivered
      in the wrong order due to this unnecessary queuing which is fixed
      by this patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      f3c9d38a