1. 26 4月, 2011 1 次提交
  2. 31 3月, 2011 1 次提交
  3. 15 3月, 2011 1 次提交
  4. 11 3月, 2011 1 次提交
  5. 09 3月, 2011 1 次提交
    • S
      GFS2: Fix glock deallocation race · fc0e38da
      Steven Whitehouse 提交于
      This patch fixes a race in deallocating glocks which was introduced
      in the RCU glock patch. We need to ensure that the glock count is
      kept correct even in the case that there is a race to add a new
      glock into the hash table. Also, to avoid having to wait for an
      RCU grace period, the glock counter can be decremented before
      call_rcu() is called.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fc0e38da
  6. 17 2月, 2011 1 次提交
  7. 31 1月, 2011 1 次提交
  8. 21 1月, 2011 1 次提交
    • S
      GFS2: Use RCU for glock hash table · bc015cb8
      Steven Whitehouse 提交于
      This has a number of advantages:
      
       - Reduces contention on the hash table lock
       - Makes the code smaller and simpler
       - Should speed up glock dumps when under load
       - Removes ref count changing in examine_bucket
       - No longer need hash chain lock in glock_put() in common case
      
      There are some further changes which this enables and which
      we may do in the future. One is to look at using SLAB_RCU,
      and another is to look at using a per-cpu counter for the
      per-sb glock counter, since that is touched twice in the
      lifetime of each glock (but only used at umount time).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      bc015cb8
  9. 30 11月, 2010 5 次提交
  10. 15 11月, 2010 1 次提交
    • S
      GFS2: Fix inode deallocation race · 044b9414
      Steven Whitehouse 提交于
      This area of the code has always been a bit delicate due to the
      subtleties of lock ordering. The problem is that for "normal"
      alloc/dealloc, we always grab the inode locks first and the rgrp lock
      later.
      
      In order to ensure no races in looking up the unlinked, but still
      allocated inodes, we need to hold the rgrp lock when we do the lookup,
      which means that we can't take the inode glock.
      
      The solution is to borrow the technique already used by NFS to solve
      what is essentially the same problem (given an inode number, look up
      the inode carefully, checking that it really is in the expected
      state).
      
      We cannot do that directly from the allocation code (lock ordering
      again) so we give the job to the pre-existing delete workqueue and
      carry on with the allocation as normal.
      
      If we find there is no space, we do a journal flush (required anyway
      if space from a deallocation is to be released) which should block
      against the pending deallocations, so we should always get the space
      back.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      044b9414
  11. 29 9月, 2010 1 次提交
  12. 20 9月, 2010 2 次提交
    • S
      GFS2: Use new workqueue scheme · 9fa0ea9f
      Steven Whitehouse 提交于
      The recovery workqueue can be freezable since
      we want it to finish what it is doing if the system is to
      be frozen (although why you'd want to freeze a cluster node
      is beyond me since it will result in it being ejected from
      the cluster). It does still make sense for single node
      GFS2 filesystems though.
      
      The glock workqueue will benefit from being able to run more
      work items concurrently. A test running postmark shows
      improved performance and multi-threaded workloads are likely
      to benefit even more. It needs to be high priority because
      the latency directly affects the latency of filesystem glock
      operations.
      
      The delete workqueue is similar to the recovery workqueue in
      that it must not get blocked by memory allocations, and may
      run for a long time.
      
      Potentially other GFS2 threads might also be converted to
      workqueues, but I'll leave that for a later patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      9fa0ea9f
    • S
      GFS2: Don't enforce min hold time when two demotes occur in rapid succession · 7b5e3d5f
      Steven Whitehouse 提交于
      Due to the design of the VFS, it is quite usual for operations on GFS2
      to consist of a lookup (requiring a shared lock) followed by an
      operation requiring an exclusive lock. If a remote node has cached an
      exclusive lock, then it will receive two demote events in rapid succession
      firstly for a shared lock and then to unlocked. The existing min hold time
      code was triggering in this case, even if the node was otherwise idle
      since the state change time was being updated by the initial demote.
      
      This patch introduces logic to skip the min hold timer in the case that
      a "double demote" of this kind has occurred. The min hold timer will
      still be used in all other cases.
      
      A new glock flag is introduced which is used to keep track of whether
      there have been any newly queued holders since the last glock state
      change. The min hold time is only applied if the flag is set.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Tested-by: NAbhijith Das <adas@redhat.com>
      7b5e3d5f
  13. 02 8月, 2010 1 次提交
  14. 29 7月, 2010 2 次提交
    • S
      Revert "GFS2: recovery stuck on transaction lock" · 7cdee5db
      Steven Whitehouse 提交于
      This reverts commit b7dc2df5.
      
      The initial patch didn't quite work since it doesn't cover all
      the possible routes by which the GLF_FROZEN flag might be set.
      A revised fix is coming up in the next patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7cdee5db
    • S
      GFS2: Make "try" lock not try quite so hard · d5341a92
      Steven Whitehouse 提交于
      This looks like a big change, but in reality its only a single line of actual
      code change, the rest is just moving a function to before its new caller.
      The "try" flag for glocks is a rather subtle and delicate setting since it
      requires that the state machine tries just hard enough to ensure that it has
      a good chance of getting the requested lock, but no so hard that the
      request can land up blocked behind another.
      
      The patch adds in an additional check which will fail any queued try
      locks if there is another request blocking the try lock request which
      is not granted and compatible, nor in progress already. The check is made
      only after all pending locks which may be granted have been granted.
      
      I've checked this with the reproducer for the reported flock bug which
      this is intended to fix, and it now passes.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d5341a92
  15. 19 7月, 2010 1 次提交
    • D
      mm: add context argument to shrinker callback · 7f8275d0
      Dave Chinner 提交于
      The current shrinker implementation requires the registered callback
      to have global state to work from. This makes it difficult to shrink
      caches that are not global (e.g. per-filesystem caches). Pass the shrinker
      structure to the callback so that users can embed the shrinker structure
      in the context the shrinker needs to operate on and get back to it in the
      callback via container_of().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      7f8275d0
  16. 15 7月, 2010 1 次提交
    • B
      GFS2: recovery stuck on transaction lock · b7dc2df5
      Bob Peterson 提交于
      This patch fixes bugzilla bug #590878: GFS2: recovery stuck on
      transaction lock.  We set the frozen flag on the glock when we receive
      a completion that cannot be delivered due to blocked locks. At that
      point we check to see whether the first waiting holder has the noexp
      flag set. If the noexp lock is queued later, then we need to unfreeze
      the glock at that point in time, namely, in the glock work function.
      
      This patch was originally written by Steve Whitehouse, but since
      he's on holiday, I'm submitting it.  It's been well tested with a
      complex recovery test called revolver.
      Signed-off-by: NSteve Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      b7dc2df5
  17. 14 4月, 2010 1 次提交
    • B
      GFS2: glock livelock · 1a0eae88
      Bob Peterson 提交于
      This patch fixes a couple gfs2 problems with the reclaiming of
      unlinked dinodes.  First, there were a couple of livelocks where
      everything would come to a halt waiting for a glock that was
      seemingly held by a process that no longer existed.  In fact, the
      process did exist, it just had the wrong pid number in the holder
      information.  Second, there was a lock ordering problem between
      inode locking and glock locking.  Third, glock/inode contention
      could sometimes cause inodes to be improperly marked invalid by
      iget_failed.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      1a0eae88
  18. 01 3月, 2010 3 次提交
    • B
      GFS2: print glock numbers in hex · 4818972e
      Bob Peterson 提交于
      This patch changes glock numbers from printing in decimal to hex.
      Since DLM prints corresponding resource IDs in hex, it makes debugging
      easier.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4818972e
    • S
      GFS2: Remove loopy umount code · c1184f8a
      Steven Whitehouse 提交于
      As a consequence of the previous patch, we can now remove the
      loop which used to be required due to the circular dependency
      between the inodes and glocks. Instead we can just invalidate
      the inodes, and then clear up any glocks which are left.
      
      Also we no longer need the rwsem since there is no longer any
      danger of the inode invalidation calling back into the glock
      code (and from there back into the inode code).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c1184f8a
    • S
      GFS2: Metadata address space clean up · 009d8518
      Steven Whitehouse 提交于
      Since the start of GFS2, an "extra" inode has been used to store
      the metadata belonging to each inode. The only reason for using
      this inode was to have an extra address space, the other fields
      were unused. This means that the memory usage was rather inefficient.
      
      The reason for keeping each inode's metadata in a separate address
      space is that when glocks are requested on remote nodes, we need to
      be able to efficiently locate the data and metadata which relating
      to that glock (inode) in order to sync or sync and invalidate it
      (depending on the remotely requested lock mode).
      
      This patch adds a new type of glock, which has in addition to
      its normal fields, has an address space. This applies to all
      inode and rgrp glocks (but to no other glock types which remain
      as before). As a result, we no longer need to have the second
      inode.
      
      This results in three major improvements:
       1. A saving of approx 25% of memory used in caching inodes
       2. A removal of the circular dependency between inodes and glocks
       3. No confusion between "normal" and "metadata" inodes in super.c
      
      Although the first of these is the more immediately apparent, the
      second is just as important as it now enables a number of clean
      ups at umount time. Those will be the subject of future patches.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      009d8518
  19. 03 2月, 2010 1 次提交
    • S
      GFS2: Extend umount wait coverage to full glock lifetime · 8f05228e
      Steven Whitehouse 提交于
      Although all glocks are, by the time of the umount glock wait,
      scheduled for demotion, some of them haven't made it far
      enough through the process for the original set of waiting
      code to wait for them.
      
      This extends the ref count to the whole glock lifetime in order
      to ensure that the waiting does catch all glocks. It does make
      it a bit more invasive, but it seems the only sensible solution
      at the moment.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8f05228e
  20. 03 12月, 2009 2 次提交
  21. 30 7月, 2009 4 次提交
  22. 12 6月, 2009 1 次提交
    • S
      GFS2: Add tracepoints · 63997775
      Steven Whitehouse 提交于
      This patch adds the ability to trace various aspects of the GFS2
      filesystem. The trace points are divided into three groups,
      glocks, logging and bmap. These points have been chosen because
      they allow inspection of the major internal functions of GFS2
      and they are also generic enough that they are unlikely to need
      any major changes as the filesystem evolves.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      63997775
  23. 19 5月, 2009 1 次提交
    • S
      GFS2: Umount recovery race fix · fe64d517
      Steven Whitehouse 提交于
      This patch fixes a race condition where we can receive recovery
      requests part way through processing a umount. This was causing
      problems since the recovery thread had already gone away.
      
      Looking in more detail at the recovery code, it was really trying
      to implement a slight variation on a work queue, and that happens to
      align nicely with the recently introduced slow-work subsystem. As a
      result I've updated the code to use slow-work, rather than its own home
      grown variety of work queue.
      
      When using the wait_on_bit() function, I noticed that the wait function
      that was supplied as an argument was appearing in the WCHAN field, so
      I've updated the function names in order to produce more meaningful
      output.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fe64d517
  24. 09 5月, 2009 1 次提交
  25. 15 4月, 2009 1 次提交
  26. 24 3月, 2009 3 次提交
    • S
      GFS2: Add a "demote a glock" interface to sysfs · 64d576ba
      Steven Whitehouse 提交于
      This adds a sysfs file called demote_rq to GFS2's
      per filesystem directory. Its possible to use this
      file to demote arbitrary glocks in exactly the same
      way as if a request had come in from a remote node.
      
      This is intended for testing issues relating to caching
      of data under glocks. Despite that, the interface is
      generic enough to send requests to any type of glock,
      but be careful as its not always safe to send an
      arbitrary message to an arbitrary glock. For that reason
      and to prevent DoS, this interface is restricted to root
      only.
      
      The messages look like this:
      
      <type>:<glocknumber> <mode>
      
      Example:
      
      echo -n "2:13324 EX" >/sys/fs/gfs2/unity:myfs/demote_rq
      
      Which means "please demote inode glock (type 2) number 13324 so that
      I can get an EX (exclusive) lock". The lock modes are those which
      would normally be sent by a remote node in its callback so if you
      want to unlock a glock, you use EX, to demote to shared, use SH or PR
      (depending on whether you like GFS2 or DLM lock modes better!).
      
      If the glock doesn't exist, you'll get -ENOENT returned. If the
      arguments don't make sense, you'll get -EINVAL returned.
      
      The plan is that this interface will be used in combination with
      the blktrace patch which I recently posted for comments although
      it is, of course, still useful in its own right.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      64d576ba
    • S
      GFS2: Fix deadlock on journal flush · d8348de0
      Steven Whitehouse 提交于
      This patch fixes a deadlock when the journal is flushed and there
      are dirty inodes other than the one which caused the journal flush.
      Originally the journal flushing code was trying to obtain the
      transaction glock while running the flush code for an inode glock.
      We no longer require the transaction glock at this point in time
      since we know that any attempt to get the transaction glock from
      another node will result in a journal flush. So if we are flushing
      the journal, we can be sure that the transaction lock is still
      cached from when the transaction was started.
      
      By inlining a version of gfs2_trans_begin() (minus the bit which
      gets the transaction glock) we can avoid the deadlock problems
      caused if there is a demote request queued up on the transaction
      glock.
      
      In addition I've also moved the umount rwsem so that it covers
      the glock workqueue, since it all demotions are done by this
      workqueue now. That fixes a bug on umount which I came across
      while fixing the original problem.
      Reported-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d8348de0
    • S
      GFS2: Remove unused field from glock · ac2425e7
      Steven Whitehouse 提交于
      The time stamp field is unused in the glock now that we are
      using a shrinker, so that we can remove it and save sizeof(unsigned long)
      bytes in each glock.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ac2425e7