1. 02 8月, 2010 1 次提交
  2. 29 7月, 2010 2 次提交
    • S
      Revert "GFS2: recovery stuck on transaction lock" · 7cdee5db
      Steven Whitehouse 提交于
      This reverts commit b7dc2df5.
      
      The initial patch didn't quite work since it doesn't cover all
      the possible routes by which the GLF_FROZEN flag might be set.
      A revised fix is coming up in the next patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7cdee5db
    • S
      GFS2: Make "try" lock not try quite so hard · d5341a92
      Steven Whitehouse 提交于
      This looks like a big change, but in reality its only a single line of actual
      code change, the rest is just moving a function to before its new caller.
      The "try" flag for glocks is a rather subtle and delicate setting since it
      requires that the state machine tries just hard enough to ensure that it has
      a good chance of getting the requested lock, but no so hard that the
      request can land up blocked behind another.
      
      The patch adds in an additional check which will fail any queued try
      locks if there is another request blocking the try lock request which
      is not granted and compatible, nor in progress already. The check is made
      only after all pending locks which may be granted have been granted.
      
      I've checked this with the reproducer for the reported flock bug which
      this is intended to fix, and it now passes.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d5341a92
  3. 19 7月, 2010 1 次提交
    • D
      mm: add context argument to shrinker callback · 7f8275d0
      Dave Chinner 提交于
      The current shrinker implementation requires the registered callback
      to have global state to work from. This makes it difficult to shrink
      caches that are not global (e.g. per-filesystem caches). Pass the shrinker
      structure to the callback so that users can embed the shrinker structure
      in the context the shrinker needs to operate on and get back to it in the
      callback via container_of().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      7f8275d0
  4. 15 7月, 2010 1 次提交
    • B
      GFS2: recovery stuck on transaction lock · b7dc2df5
      Bob Peterson 提交于
      This patch fixes bugzilla bug #590878: GFS2: recovery stuck on
      transaction lock.  We set the frozen flag on the glock when we receive
      a completion that cannot be delivered due to blocked locks. At that
      point we check to see whether the first waiting holder has the noexp
      flag set. If the noexp lock is queued later, then we need to unfreeze
      the glock at that point in time, namely, in the glock work function.
      
      This patch was originally written by Steve Whitehouse, but since
      he's on holiday, I'm submitting it.  It's been well tested with a
      complex recovery test called revolver.
      Signed-off-by: NSteve Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      b7dc2df5
  5. 14 4月, 2010 1 次提交
    • B
      GFS2: glock livelock · 1a0eae88
      Bob Peterson 提交于
      This patch fixes a couple gfs2 problems with the reclaiming of
      unlinked dinodes.  First, there were a couple of livelocks where
      everything would come to a halt waiting for a glock that was
      seemingly held by a process that no longer existed.  In fact, the
      process did exist, it just had the wrong pid number in the holder
      information.  Second, there was a lock ordering problem between
      inode locking and glock locking.  Third, glock/inode contention
      could sometimes cause inodes to be improperly marked invalid by
      iget_failed.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      1a0eae88
  6. 01 3月, 2010 3 次提交
    • B
      GFS2: print glock numbers in hex · 4818972e
      Bob Peterson 提交于
      This patch changes glock numbers from printing in decimal to hex.
      Since DLM prints corresponding resource IDs in hex, it makes debugging
      easier.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4818972e
    • S
      GFS2: Remove loopy umount code · c1184f8a
      Steven Whitehouse 提交于
      As a consequence of the previous patch, we can now remove the
      loop which used to be required due to the circular dependency
      between the inodes and glocks. Instead we can just invalidate
      the inodes, and then clear up any glocks which are left.
      
      Also we no longer need the rwsem since there is no longer any
      danger of the inode invalidation calling back into the glock
      code (and from there back into the inode code).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c1184f8a
    • S
      GFS2: Metadata address space clean up · 009d8518
      Steven Whitehouse 提交于
      Since the start of GFS2, an "extra" inode has been used to store
      the metadata belonging to each inode. The only reason for using
      this inode was to have an extra address space, the other fields
      were unused. This means that the memory usage was rather inefficient.
      
      The reason for keeping each inode's metadata in a separate address
      space is that when glocks are requested on remote nodes, we need to
      be able to efficiently locate the data and metadata which relating
      to that glock (inode) in order to sync or sync and invalidate it
      (depending on the remotely requested lock mode).
      
      This patch adds a new type of glock, which has in addition to
      its normal fields, has an address space. This applies to all
      inode and rgrp glocks (but to no other glock types which remain
      as before). As a result, we no longer need to have the second
      inode.
      
      This results in three major improvements:
       1. A saving of approx 25% of memory used in caching inodes
       2. A removal of the circular dependency between inodes and glocks
       3. No confusion between "normal" and "metadata" inodes in super.c
      
      Although the first of these is the more immediately apparent, the
      second is just as important as it now enables a number of clean
      ups at umount time. Those will be the subject of future patches.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      009d8518
  7. 03 2月, 2010 1 次提交
    • S
      GFS2: Extend umount wait coverage to full glock lifetime · 8f05228e
      Steven Whitehouse 提交于
      Although all glocks are, by the time of the umount glock wait,
      scheduled for demotion, some of them haven't made it far
      enough through the process for the original set of waiting
      code to wait for them.
      
      This extends the ref count to the whole glock lifetime in order
      to ensure that the waiting does catch all glocks. It does make
      it a bit more invasive, but it seems the only sensible solution
      at the moment.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8f05228e
  8. 03 12月, 2009 2 次提交
  9. 30 7月, 2009 4 次提交
  10. 12 6月, 2009 1 次提交
    • S
      GFS2: Add tracepoints · 63997775
      Steven Whitehouse 提交于
      This patch adds the ability to trace various aspects of the GFS2
      filesystem. The trace points are divided into three groups,
      glocks, logging and bmap. These points have been chosen because
      they allow inspection of the major internal functions of GFS2
      and they are also generic enough that they are unlikely to need
      any major changes as the filesystem evolves.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      63997775
  11. 19 5月, 2009 1 次提交
    • S
      GFS2: Umount recovery race fix · fe64d517
      Steven Whitehouse 提交于
      This patch fixes a race condition where we can receive recovery
      requests part way through processing a umount. This was causing
      problems since the recovery thread had already gone away.
      
      Looking in more detail at the recovery code, it was really trying
      to implement a slight variation on a work queue, and that happens to
      align nicely with the recently introduced slow-work subsystem. As a
      result I've updated the code to use slow-work, rather than its own home
      grown variety of work queue.
      
      When using the wait_on_bit() function, I noticed that the wait function
      that was supplied as an argument was appearing in the WCHAN field, so
      I've updated the function names in order to produce more meaningful
      output.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fe64d517
  12. 09 5月, 2009 1 次提交
  13. 15 4月, 2009 1 次提交
  14. 24 3月, 2009 4 次提交
    • S
      GFS2: Add a "demote a glock" interface to sysfs · 64d576ba
      Steven Whitehouse 提交于
      This adds a sysfs file called demote_rq to GFS2's
      per filesystem directory. Its possible to use this
      file to demote arbitrary glocks in exactly the same
      way as if a request had come in from a remote node.
      
      This is intended for testing issues relating to caching
      of data under glocks. Despite that, the interface is
      generic enough to send requests to any type of glock,
      but be careful as its not always safe to send an
      arbitrary message to an arbitrary glock. For that reason
      and to prevent DoS, this interface is restricted to root
      only.
      
      The messages look like this:
      
      <type>:<glocknumber> <mode>
      
      Example:
      
      echo -n "2:13324 EX" >/sys/fs/gfs2/unity:myfs/demote_rq
      
      Which means "please demote inode glock (type 2) number 13324 so that
      I can get an EX (exclusive) lock". The lock modes are those which
      would normally be sent by a remote node in its callback so if you
      want to unlock a glock, you use EX, to demote to shared, use SH or PR
      (depending on whether you like GFS2 or DLM lock modes better!).
      
      If the glock doesn't exist, you'll get -ENOENT returned. If the
      arguments don't make sense, you'll get -EINVAL returned.
      
      The plan is that this interface will be used in combination with
      the blktrace patch which I recently posted for comments although
      it is, of course, still useful in its own right.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      64d576ba
    • S
      GFS2: Fix deadlock on journal flush · d8348de0
      Steven Whitehouse 提交于
      This patch fixes a deadlock when the journal is flushed and there
      are dirty inodes other than the one which caused the journal flush.
      Originally the journal flushing code was trying to obtain the
      transaction glock while running the flush code for an inode glock.
      We no longer require the transaction glock at this point in time
      since we know that any attempt to get the transaction glock from
      another node will result in a journal flush. So if we are flushing
      the journal, we can be sure that the transaction lock is still
      cached from when the transaction was started.
      
      By inlining a version of gfs2_trans_begin() (minus the bit which
      gets the transaction glock) we can avoid the deadlock problems
      caused if there is a demote request queued up on the transaction
      glock.
      
      In addition I've also moved the umount rwsem so that it covers
      the glock workqueue, since it all demotions are done by this
      workqueue now. That fixes a bug on umount which I came across
      while fixing the original problem.
      Reported-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d8348de0
    • S
      GFS2: Remove unused field from glock · ac2425e7
      Steven Whitehouse 提交于
      The time stamp field is unused in the glock now that we are
      using a shrinker, so that we can remove it and save sizeof(unsigned long)
      bytes in each glock.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ac2425e7
    • S
      GFS2: Merge lock_dlm module into GFS2 · f057f6cd
      Steven Whitehouse 提交于
      This is the big patch that I've been working on for some time
      now. There are many reasons for wanting to make this change
      such as:
       o Reducing overhead by eliminating duplicated fields between structures
       o Simplifcation of the code (reduces the code size by a fair bit)
       o The locking interface is now the DLM interface itself as proposed
         some time ago.
       o Fewer lookups of glocks when processing replies from the DLM
       o Fewer memory allocations/deallocations for each glock
       o Scope to do further optimisations in the future (but this patch is
         more than big enough for now!)
      
      Please note that (a) this patch relates to the lock_dlm module and
      not the DLM itself, that is still a separate module; and (b) that
      we retain the ability to build GFS2 as a standalone single node
      filesystem with out requiring the DLM.
      
      This patch needs a lot of testing, hence my keeping it I restarted
      my -git tree after the last merge window. That way, this has the maximum
      exposure before its merged. This is (modulo a few minor bug fixes) the
      same patch that I've been posting on and off the the last three months
      and its passed a number of different tests so far.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f057f6cd
  15. 05 1月, 2009 7 次提交
    • J
      GFS2: Use DEFINE_SPINLOCK · eb8374e7
      Julia Lawall 提交于
      SPIN_LOCK_UNLOCKED is deprecated.  The following makes the change suggested
      in Documentation/spinlocks.txt
      
      The semantic patch that makes this change is as follows:
      (http://www.emn.fr/x-info/coccinelle/)
      
      // <smpl>
      @@
      declarer name DEFINE_SPINLOCK;
      identifier xxx_lock;
      @@
      
      - spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED;
      + DEFINE_SPINLOCK(xxx_lock);
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      eb8374e7
    • S
      Revert "GFS2: Fix use-after-free bug on umount" · fefc03bf
      Steven Whitehouse 提交于
      This reverts commit 78802499912f1ba31ce83a94c55b5a980f250a43.
      
      The original patch is causing problems in relation to order of
      operations at umount in relation to jdata files. I need to fix
      this a different way.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fefc03bf
    • S
      GFS2: Fix use-after-free bug on umount · 3af165ac
      Steven Whitehouse 提交于
      There was a use-after-free with the GFS2 super block during
      umount. This patch moves almost all of the umount code from
      ->put_super into ->kill_sb, the only bit that cannot be moved
      being the glock hash clearing which has to remain as ->put_super
      due to umount ordering requirements. As a result its now obvious
      that the kfree is the final operation, whereas before it was
      hidden in ->put_super.
      
      Also gfs2_jindex_free is then only referenced from a single file
      so thats moved and marked static too.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3af165ac
    • S
      GFS2: Move four functions from super.c · 2bfb6449
      Steven Whitehouse 提交于
      The functions which are being moved can all be marked
      static in their new locations, since they only have
      a single caller each. Their new locations are more
      logical than before and some of the functions are
      small enough that the compiler might well inline them.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2bfb6449
    • S
      GFS2: Kill two daemons with one patch · 97cc1025
      Steven Whitehouse 提交于
      This patch removes the two daemons, gfs2_scand and gfs2_glockd
      and replaces them with a shrinker which is called from the VM.
      
      The net result is that GFS2 responds better when there is memory
      pressure, since it shrinks the glock cache at the same rate
      as the VFS shrinks the dcache and icache. There are no longer
      any time based criteria for shrinking glocks, they are kept
      until such time as the VM asks for more memory and then we
      demote just as many glocks as required.
      
      There are potential future changes to this code, including the
      possibility of sorting the glocks which are to be written back
      into inode number order, to get a better I/O ordering. It would
      be very useful to have an elevator based workqueue implementation
      for this, as that would automatically deal with the read I/O cases
      at the same time.
      
      This patch is my answer to Andrew Morton's remark, made during
      the initial review of GFS2, asking why GFS2 needs so many kernel
      threads, the answer being that it doesn't :-) This patch is a
      net loss of about 200 lines of code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      97cc1025
    • S
      GFS2: Fix "truncate in progress" hang · 813e0c46
      Steven Whitehouse 提交于
      Following on from the recent clean up of gfs2_quotad, this patch moves
      the processing of "truncate in progress" inodes from the glock workqueue
      into gfs2_quotad. This fixes a hang due to the "truncate in progress"
      processing requiring glocks in order to complete.
      
      It might seem odd to use gfs2_quotad for this particular item, but
      we have to use a pre-existing thread since creating a thread implies
      a GFP_KERNEL memory allocation which is not allowed from the glock
      workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd
      may deadlock if used for this operation. gfs2_scand and gfs2_glockd are
      both scheduled for removal at some (hopefully not too distant) future
      point. That leaves only gfs2_quotad whose workload is generally fairly
      light and is easily adapted for this extra task.
      
      Also, as a result of this change, it opens the way for a future patch to
      make the reading of the inode's information asynchronous with respect to
      the glock workqueue, which is another improvement that has been on the list
      for some time now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      813e0c46
    • H
      GFS2: sparse annotation of gl->gl_spin · 55ba474d
      Harvey Harrison 提交于
      fs/gfs2/glock.c:308:5: warning: context problem in 'do_promote': '_spin_unlock' expected different context
      fs/gfs2/glock.c:308:5:    context '*gl+28': wanted >= 1, got 0
      fs/gfs2/glock.c:529:2: warning: context problem in 'do_xmote': '_spin_unlock' expected different context
      fs/gfs2/glock.c:529:2:    context '*gl+28': wanted >= 1, got 0
      fs/gfs2/glock.c:925:3: warning: context problem in 'add_to_queue': '_spin_unlock' expected different context
      fs/gfs2/glock.c:925:3:    context '*gl+28': wanted >= 1, got 0
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      55ba474d
  16. 18 9月, 2008 1 次提交
    • S
      GFS2: high time to take some time over atime · 719ee344
      Steven Whitehouse 提交于
      Until now, we've used the same scheme as GFS1 for atime. This has failed
      since atime is a per vfsmnt flag, not a per fs flag and as such the
      "noatime" flag was not getting passed down to the filesystems. This
      patch removes all the "special casing" around atime updates and we
      simply use the VFS's atime code.
      
      The net result is that GFS2 will now support all the same atime related
      mount options of any other filesystem on a per-vfsmnt basis. We do lose
      the "lazy atime" updates, but we gain "relatime". We could add lazy
      atime to the VFS at a later date, if there is a requirement for that
      variant still - I suspect relatime will be enough.
      
      Also we lose about 100 lines of code after this patch has been applied,
      and I have a suspicion that it will speed things up a bit, even when
      atime is "on". So it seems like a nice clean up as well.
      
      From a user perspective, everything stays the same except the loss of
      the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
      least, and to be honest I don't think anybody ever used it) and that a
      number of options which were ignored before now work correctly.
      
      Please let me know if you've got any comments. I'm pushing this out
      early so that you can all see what my plans are.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      719ee344
  17. 05 9月, 2008 1 次提交
    • S
      GFS2: Fix race relating to glock min-hold time · dff52574
      Steven Whitehouse 提交于
      In the case that a request for a glock arrives right after the
      grant reply has arrived, it sometimes means that the gl_tstamp
      field hasn't been updated recently enough. The net result is that
      the min-hold time for the glock is ignored. If this happens
      often enough, it leads to poor performance.
      
      This patch adds an additional test, so that if the reply pending
      bit is set on a glock, then it will select the maximum length of
      time for the min-hold time, rather than looking at gl_tstamp.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      dff52574
  18. 13 8月, 2008 1 次提交
  19. 07 7月, 2008 2 次提交
    • S
      [GFS2] Allow local DF locks when holding a cached EX glock · 209806ab
      Steven Whitehouse 提交于
      We already allow local SH locks while we hold a cached EX glock, so here
      we allow DF locks as well. This works only because we rely on the VFS's
      invalidation for locally cached data, and because if we hold an EX lock,
      then we know that no other node can be caching data relating to this
      file.
      
      It dramatically speeds up initial writes to O_DIRECT files since we fall
      back to buffered I/O for this and would otherwise bounce between DF and
      EX modes on each and every write call. The lessons to be learned from
      that are to ensure that (for the time being anyway) O_DIRECT files are
      preallocated and that they are written to using reasonably large I/O
      sizes. Even so this change fixes that corner case nicely
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      209806ab
    • S
      [GFS2] Fix delayed demote race · 265d529c
      Steven Whitehouse 提交于
      There is a race in the delayed demote code where it does the wrong thing
      if a demotion to UN has occurred for other reasons before the delay has
      expired. This patch adds an assert to catch that condition as well as
      fixing the root cause by adding an additional check for the UN state.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      265d529c
  20. 27 6月, 2008 3 次提交
    • S
      [GFS2] Remove remote lock dropping code · 1bdad606
      Steven Whitehouse 提交于
      There are several reasons why this is undesirable:
      
       1. It never happens during normal operation anyway
       2. If it does happen it causes performance to be very, very poor
       3. It isn't likely to solve the original problem (memory shortage
          on remote DLM node) it was supposed to solve
       4. It uses a bunch of arbitrary constants which are unlikely to be
          correct for any particular situation and for which the tuning seems
          to be a black art.
       5. In an N node cluster, only 1/N of the dropped locked will actually
          contribute to solving the problem on average.
      
      So all in all we are better off without it. This also makes merging
      the lock_dlm module into GFS2 a bit easier.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1bdad606
    • S
      [GFS2] No lock_nolock · 048bca22
      Steven Whitehouse 提交于
      This patch merges the lock_nolock module into GFS2 itself. As well as removing
      some of the overhead of the module, it also means that its now impossible to
      build GFS2 without a lock module (which would be a pointless thing to do
      anyway).
      
      We also plan to merge lock_dlm into GFS2 in the future, but that is a more
      tricky task, and will therefore be a separate patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: David Teigland <teigland@redhat.com>
      048bca22
    • S
      [GFS2] Clean up the glock core · 6802e340
      Steven Whitehouse 提交于
      This patch implements a number of cleanups to the core of the
      GFS2 glock code. As a result a lot of code is removed. It looks
      like a really big change, but actually a large part of this patch
      is either removing or moving existing code.
      
      There are some new bits too though, such as the new run_queue()
      function which is considerably streamlined. Highlights of this
      patch include:
      
       o Fixes a cluster coherency bug during SH -> EX lock conversions
       o Removes the "glmutex" code in favour of a single bit lock
       o Removes the ->go_xmote_bh() for inodes since it was duplicating
         ->go_lock()
       o We now only use the ->lm_lock() function for both locks and
         unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
       o The fast path is considerably shortly, giving performance gains
         especially with lock_nolock
       o The glock_workqueue is now used for all the callbacks from the DLM
         which allows us to simplify the lock_dlm module (see following patch)
       o The way is now open to make further changes such as eliminating the two
         threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
         scheme.
      
      This patch has undergone extensive testing with various test suites
      so it should be pretty stable by now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      6802e340
  21. 31 3月, 2008 1 次提交
    • B
      [GFS2] Invalidate cache at correct point · 58e9fee1
      Benjamin Marzinski 提交于
      GFS2 wasn't invalidating its cache before it called into the lock manager
      with a request that could potentially drop a lock.  This was leaving a
      window where the lock could be actually be held by another node, but the
      file's page cache would still appear valid, causing coherency problems.
      This patch moves the cache invalidation to before the lock manager call
      when dropping a lock. It also adds the option to the lock_dlm lock
      manager to not use conversion mode deadlock avoidance, which, on a
      conversion from shared to exclusive, could internally drop the lock, and
      then reacquire in. GFS2 now asks lock_dlm to not do this.  Instead, GFS2
      manually drops the lock and reacquires it.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      58e9fee1