1. 05 1月, 2009 2 次提交
    • S
      GFS2: Fix "truncate in progress" hang · 813e0c46
      Steven Whitehouse 提交于
      Following on from the recent clean up of gfs2_quotad, this patch moves
      the processing of "truncate in progress" inodes from the glock workqueue
      into gfs2_quotad. This fixes a hang due to the "truncate in progress"
      processing requiring glocks in order to complete.
      
      It might seem odd to use gfs2_quotad for this particular item, but
      we have to use a pre-existing thread since creating a thread implies
      a GFP_KERNEL memory allocation which is not allowed from the glock
      workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd
      may deadlock if used for this operation. gfs2_scand and gfs2_glockd are
      both scheduled for removal at some (hopefully not too distant) future
      point. That leaves only gfs2_quotad whose workload is generally fairly
      light and is easily adapted for this extra task.
      
      Also, as a result of this change, it opens the way for a future patch to
      make the reading of the inode's information asynchronous with respect to
      the glock workqueue, which is another improvement that has been on the list
      for some time now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      813e0c46
    • H
      GFS2: sparse annotation of gl->gl_spin · 55ba474d
      Harvey Harrison 提交于
      fs/gfs2/glock.c:308:5: warning: context problem in 'do_promote': '_spin_unlock' expected different context
      fs/gfs2/glock.c:308:5:    context '*gl+28': wanted >= 1, got 0
      fs/gfs2/glock.c:529:2: warning: context problem in 'do_xmote': '_spin_unlock' expected different context
      fs/gfs2/glock.c:529:2:    context '*gl+28': wanted >= 1, got 0
      fs/gfs2/glock.c:925:3: warning: context problem in 'add_to_queue': '_spin_unlock' expected different context
      fs/gfs2/glock.c:925:3:    context '*gl+28': wanted >= 1, got 0
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      55ba474d
  2. 18 9月, 2008 1 次提交
    • S
      GFS2: high time to take some time over atime · 719ee344
      Steven Whitehouse 提交于
      Until now, we've used the same scheme as GFS1 for atime. This has failed
      since atime is a per vfsmnt flag, not a per fs flag and as such the
      "noatime" flag was not getting passed down to the filesystems. This
      patch removes all the "special casing" around atime updates and we
      simply use the VFS's atime code.
      
      The net result is that GFS2 will now support all the same atime related
      mount options of any other filesystem on a per-vfsmnt basis. We do lose
      the "lazy atime" updates, but we gain "relatime". We could add lazy
      atime to the VFS at a later date, if there is a requirement for that
      variant still - I suspect relatime will be enough.
      
      Also we lose about 100 lines of code after this patch has been applied,
      and I have a suspicion that it will speed things up a bit, even when
      atime is "on". So it seems like a nice clean up as well.
      
      From a user perspective, everything stays the same except the loss of
      the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
      least, and to be honest I don't think anybody ever used it) and that a
      number of options which were ignored before now work correctly.
      
      Please let me know if you've got any comments. I'm pushing this out
      early so that you can all see what my plans are.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      719ee344
  3. 05 9月, 2008 1 次提交
    • S
      GFS2: Fix race relating to glock min-hold time · dff52574
      Steven Whitehouse 提交于
      In the case that a request for a glock arrives right after the
      grant reply has arrived, it sometimes means that the gl_tstamp
      field hasn't been updated recently enough. The net result is that
      the min-hold time for the glock is ignored. If this happens
      often enough, it leads to poor performance.
      
      This patch adds an additional test, so that if the reply pending
      bit is set on a glock, then it will select the maximum length of
      time for the min-hold time, rather than looking at gl_tstamp.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      dff52574
  4. 13 8月, 2008 1 次提交
  5. 07 7月, 2008 2 次提交
    • S
      [GFS2] Allow local DF locks when holding a cached EX glock · 209806ab
      Steven Whitehouse 提交于
      We already allow local SH locks while we hold a cached EX glock, so here
      we allow DF locks as well. This works only because we rely on the VFS's
      invalidation for locally cached data, and because if we hold an EX lock,
      then we know that no other node can be caching data relating to this
      file.
      
      It dramatically speeds up initial writes to O_DIRECT files since we fall
      back to buffered I/O for this and would otherwise bounce between DF and
      EX modes on each and every write call. The lessons to be learned from
      that are to ensure that (for the time being anyway) O_DIRECT files are
      preallocated and that they are written to using reasonably large I/O
      sizes. Even so this change fixes that corner case nicely
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      209806ab
    • S
      [GFS2] Fix delayed demote race · 265d529c
      Steven Whitehouse 提交于
      There is a race in the delayed demote code where it does the wrong thing
      if a demotion to UN has occurred for other reasons before the delay has
      expired. This patch adds an assert to catch that condition as well as
      fixing the root cause by adding an additional check for the UN state.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      265d529c
  6. 27 6月, 2008 3 次提交
    • S
      [GFS2] Remove remote lock dropping code · 1bdad606
      Steven Whitehouse 提交于
      There are several reasons why this is undesirable:
      
       1. It never happens during normal operation anyway
       2. If it does happen it causes performance to be very, very poor
       3. It isn't likely to solve the original problem (memory shortage
          on remote DLM node) it was supposed to solve
       4. It uses a bunch of arbitrary constants which are unlikely to be
          correct for any particular situation and for which the tuning seems
          to be a black art.
       5. In an N node cluster, only 1/N of the dropped locked will actually
          contribute to solving the problem on average.
      
      So all in all we are better off without it. This also makes merging
      the lock_dlm module into GFS2 a bit easier.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1bdad606
    • S
      [GFS2] No lock_nolock · 048bca22
      Steven Whitehouse 提交于
      This patch merges the lock_nolock module into GFS2 itself. As well as removing
      some of the overhead of the module, it also means that its now impossible to
      build GFS2 without a lock module (which would be a pointless thing to do
      anyway).
      
      We also plan to merge lock_dlm into GFS2 in the future, but that is a more
      tricky task, and will therefore be a separate patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: David Teigland <teigland@redhat.com>
      048bca22
    • S
      [GFS2] Clean up the glock core · 6802e340
      Steven Whitehouse 提交于
      This patch implements a number of cleanups to the core of the
      GFS2 glock code. As a result a lot of code is removed. It looks
      like a really big change, but actually a large part of this patch
      is either removing or moving existing code.
      
      There are some new bits too though, such as the new run_queue()
      function which is considerably streamlined. Highlights of this
      patch include:
      
       o Fixes a cluster coherency bug during SH -> EX lock conversions
       o Removes the "glmutex" code in favour of a single bit lock
       o Removes the ->go_xmote_bh() for inodes since it was duplicating
         ->go_lock()
       o We now only use the ->lm_lock() function for both locks and
         unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
       o The fast path is considerably shortly, giving performance gains
         especially with lock_nolock
       o The glock_workqueue is now used for all the callbacks from the DLM
         which allows us to simplify the lock_dlm module (see following patch)
       o The way is now open to make further changes such as eliminating the two
         threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
         scheme.
      
      This patch has undergone extensive testing with various test suites
      so it should be pretty stable by now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      6802e340
  7. 31 3月, 2008 8 次提交
  8. 08 2月, 2008 2 次提交
  9. 25 1月, 2008 7 次提交
    • B
      [GFS2] Reorganize function gfs2_glmutex_lock · 398bbe68
      Bob Peterson 提交于
      This patch optimizes the function gfs2_glmutex_lock.
      The basic theory is: Why bother initializing a holder, setting up
      wait bits and then waiting on them, if you know the glock can be
      yours.  So the holder stuff is placed inside the if checking if the
      glock is locked.  This one needs careful scrutiny because changing
      anything to do with locking should strike terror into one's heart.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      398bbe68
    • F
      [GFS2] Fix runtime issue with UP kernels · 1a2781cf
      Fabio Massimo Di Nitto 提交于
      The issue is indeed UP vs SMP and it is totally random.
      
      spin_is_locked() is a bad assertion because there is no correct answer on UP.
      on UP spin_is_locked() has to return either one value or another, always.
      
      This means that in my setup I am lucky enough to trigger the issue and your you
      are lucky enough not to.
      
      the patch in attachment removes the bogus calls to BUG_ON and according to David
      (in CC and thanks for the long explanation on the problem) we can rely upon
      things like lockdep to find problem that might be trying to catch.
      Signed-off-by: NFabio M. Di Nitto <fabbione@ubuntu.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1a2781cf
    • S
      [GFS2] Don't add glocks to the journal · 2bcd610d
      Steven Whitehouse 提交于
      The only reason for adding glocks to the journal was to keep track
      of which locks required a log flush prior to release. We add a
      flag to the glock to allow this check to be made in a simpler way.
      
      This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64)
      and means that we can avoid extra work during the journal flush.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2bcd610d
    • S
      [GFS2] Remove flags no longer required · e589665e
      Steven Whitehouse 提交于
      The HIF_MUTEX and HIF_PROMOTE flags were set on the glock holders
      depending upon which of the two waiters lists they were going to
      be queued upon. They were then tested when the holders were taken
      off the lists to ensure that the right type of holder was being
      dequeued.
      
      Since we are already using separate lists, there doesn't seem a
      lot of point having these flags as well, and since setting them
      and testing them is in the fast path for locking and unlocking
      glock, this patch removes them.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e589665e
    • S
      [GFS2] Reorder writeback for glock sync · 3042a2cc
      Steven Whitehouse 提交于
      Previously we were doing (write data, wait for data, write metadata, wait
      for metadata). After this patch we so (write metadata, write data, wait for
      data, wait for metadata) which should be more efficient.
      
      Also I noticed that the drop_bh and xmote_bh functions were almost
      identical. In fact the only difference was a single test, and that
      test is such that in the drop_bh case, it would always evaluate to
      the correct result. As such we can use the xmote_bh functions in
      all the places where we were using the drop_bh function and remove
      the drop_bh functions.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3042a2cc
    • S
      [GFS2] Remove "reclaim limit" · c2932e03
      Steven Whitehouse 提交于
      This call to reclaim glocks is not needed, and in particular we don't want it
      in the fast path for locking glocks. The limit was entirely arbitrary anyway
      and we can't expect users to adjust things like this, the remaining code will
      do the right thing on its own.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c2932e03
    • W
      [GFS2] Handle multiple glock demote requests · cc7e79b1
      Wendy Cheng 提交于
      Fix a race condition where multiple glock demote requests are sent to
      a node back-to-back. This patch does a check inside handle_callback()
      to see whether a demote request is in progress. If true, it sets a flag
      to make sure run_queue() will loop again to handle the new request,
      instead of erronously setting gl_demote_state to a different state.
      Signed-off-by: NS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      cc7e79b1
  10. 10 10月, 2007 10 次提交
  11. 09 7月, 2007 3 次提交
    • S
      [GFS2] Simplify multiple glock aquisition · eaf5bd3c
      Steven Whitehouse 提交于
      There is a bug in the code which acquires multiple glocks where if the
      initial out-of-order attempt fails part way though we can land up trying
      to acquire the wrong number of glocks. This is part of the fix for red
      hat bz #239737. The other part of the bz doesn't apply to upstream
      kernels since it was fixed by:
      
      http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d3717bdf8f08a0e1039158c8bab2c24d20f492b6
      
      Since the out-of-order code doesn't appear to add anything to the
      performance of GFS2, this patch just removed it rather than trying to
      fix it. It should be much easier to see whats going on here now. In
      addition, we don't allocate any memory unless we are using a lot of
      glocks (which is a relatively uncommon case).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      eaf5bd3c
    • A
      [GFS2] Fix deallocation issues · d93cfa98
      Abhijith Das 提交于
      There were two issues during deallocation of unlinked inodes. The
      first was relating to the use of a "try" lock which in the case of
      the inode lock wasn't trying hard enough to deallocate in all
      circumstances (now changed to a normal glock) and in the case of
      the iopen lock didn't wait for the demotion of the shared lock before
      attempting to get the exclusive lock, and thereby sometimes (timing dependent)
      not completing the deallocation when it should have done.
      
      The second issue related to the lack of a way to invalidate dcache entries
      on remote nodes (now fixed by this patch) which meant that unlinks were
      taking a long time to return disk space to the fs. By adding some code to
      invalidate the dcache entries across the cluster for unlinked inodes, that
      is now fixed.
      
      This patch was written jointly by Abhijith Das and Steven Whitehouse.
      Signed-off-by: NAbhijith Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d93cfa98
    • S
      [GFS2] Clean up inode number handling · dbb7cae2
      Steven Whitehouse 提交于
      This patch cleans up the inode number handling code. The main difference
      is that instead of looking up the inodes using a struct gfs2_inum_host
      we now use just the no_addr member of this structure. The tests relating
      to no_formal_ino can then be done by the calling code. This has
      advantages in that we want to do different things in different code
      paths if the no_formal_ino doesn't match. In the NFS patch we want to
      return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
      no_formal_ino doesn't match and thus we can withdraw in this case.
      
      In order to later fix bz #201012, we need to be able to look up an inode
      without knowing no_formal_ino, as the only information that is known to
      us is the on-disk location of the inode in question.
      
      This patch will also help us to fix bz #236099 at a later date by
      cleaning up a lot of the code in that area.
      
      There are no user visible changes as a result of this patch and there
      are no changes to the on-disk format either.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      dbb7cae2