1. 15 7月, 2011 3 次提交
  2. 14 7月, 2011 1 次提交
    • S
      GFS2: Resolve inode eviction and ail list interaction bug · 380f7c65
      Steven Whitehouse 提交于
      This patch contains a few misc fixes which resolve a recently
      reported issue. This patch has been a real team effort and has
      received a lot of testing.
      
      The first issue is that the ail lock needs to be held over a few
      more operations. The lock thats added into gfs2_releasepage() may
      possibly be a candidate for replacing with RCU at some future
      point, but at this stage we've gone for the obvious fix.
      
      The second issue is that gfs2_write_inode() can end up calling
      a glock recursively when called from gfs2_evict_inode() via the
      syncing code, so it needs a guard added.
      
      The third issue is that we either need to not truncate the metadata
      pages of inodes which have zero link count, but which we cannot
      deallocate due to them still being in use by other nodes, or we need
      to ensure that those pages have all made it through the journal and
      ail lists first. This patch takes the former approach, but the
      latter has also been tested and there is nothing to choose between
      them performance-wise. So again, we could revise that decision
      in the future.
      
      Also, the inode eviction process is now better documented.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Tested-by: NBob Peterson <rpeterso@redhat.com>
      Tested-by: NAbhijith Das <adas@redhat.com>
      Reported-by: NBarry J. Marson <bmarson@redhat.com>
      Reported-by: NDavid Teigland <teigland@redhat.com>
      380f7c65
  3. 12 7月, 2011 2 次提交
    • S
      GFS2: Fix race during filesystem mount · 3942ae53
      Steven Whitehouse 提交于
      There is a potential race during filesystem mounting which has recently
      been reported. It occurs when the userland gfs_controld is able to
      process requests fast enough that it tries to use the sysfs interface
      before the lock module is properly initialised. This is a pretty
      unusual case as normally the lock module initialisation is very quick
      compared with gfs_controld.
      
      This patch adds an interruptible completion which is used to ensure that
      userland will wait for the initialisation of the lock module to
      complete.
      
      There are other potential solutions to this problem, but this is the
      quickest at this stage and has been tested both with and without
      mount.gfs2 present in the system.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Reported-by: NDavid Booher <dbooher@adams.net>
      3942ae53
    • B
      GFS2: force a log flush when invalidating the rindex glock · 1ce53368
      Benjamin Marzinski 提交于
      Right now, there is nothing that forces the log to get flushed when a node
      drops its rindex glock so that another node can grow the filesystem. If the
      log doesn't get flushed, GFS2 can corrupt the sd_log_le_rg list in the
      following way.
      
      A node puts an rgd on the list in rg_lo_add(), and then the rindex glock is
      dropped so the other node can grow the filesystem. When the node reacquires the
      rindex glock, that rgd gets deleted in clear_rgrpdi() before ever being
      removed from the list by gfs2_log_flush().
      
      This code simply forces a log flush when the rindex glock is invalidated,
      solving the problem.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1ce53368
  4. 26 5月, 2011 1 次提交
    • M
      gfs2: Drop __TIME__ usage · 8d2c50e3
      Michal Marek 提交于
      The kernel already prints its build timestamp during boot, no need to
      repeat it in random drivers and produce different object files each
      time.
      
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: cluster-devel@redhat.com
      Signed-off-by: NMichal Marek <mmarek@suse.cz>
      8d2c50e3
  5. 25 5月, 2011 2 次提交
    • Y
      vmscan: change shrinker API by passing shrink_control struct · 1495f230
      Ying Han 提交于
      Change each shrinker's API by consolidating the existing parameters into
      shrink_control struct.  This will simplify any further features added w/o
      touching each file of shrinker.
      
      [akpm@linux-foundation.org: fix build]
      [akpm@linux-foundation.org: fix warning]
      [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
      [akpm@linux-foundation.org: fix xfs warning]
      [akpm@linux-foundation.org: update gfs2]
      Signed-off-by: NYing Han <yinghan@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Acked-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1495f230
    • B
      GFS2: Processes waiting on inode glock that no processes are holding · f90e5b5b
      Bob Peterson 提交于
      This patch fixes a race in the GFS2 glock state machine that may
      result in lockups.  The symptom is that all nodes but one will
      hang, waiting for a particular glock.  All the holder records
      will have the "W" (Waiting) bit set.  The other node will
      typically have the glock stuck in Exclusive mode (EX) with no
      holder records, but the dinode will be cached.  In other words,
      an entry with "I:" will appear in the glock dump for that glock,
      but nothing else.
      
      The race has to do with the glock "Pending Demote" bit, which
      can be set, then immediately reset, thus losing the fact that
      another node needs the glock.  The sequence of events is:
      
      1. Something schedules the glock workqueue (e.g. glock request from fs)
      2. The glock workqueue gets to the point between the test of the reply pending
      bit and the spin lock:
      
              if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) {
                      finish_xmote(gl, gl->gl_reply);
                      drop_ref = 1;
              }
              down_read(&gfs2_umount_flush_sem);         <---- i.e. here
              spin_lock(&gl->gl_spin);
      
      3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and
                  (b) the demote request which sets GLF_PENDING_DEMOTE
      
      4. The following test is executed:
      
              if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
                  gl->gl_state != LM_ST_UNLOCKED &&
                  gl->gl_demote_state != LM_ST_EXCLUSIVE) {
      
      This resets the pending demote flag, and gl->gl_demote_state is not equal to
      exclusive, however because the reply from the dlm arrived after we checked for
      the GLF_REPLY_PENDING flag, gl->gl_state is still equal to unlocked, so
      although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the
      GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG.
      
      The patch closes the timing window by only transitioning the
      "Pending demote" bit to the "demote" flag once we know the
      other conditions (not unlocked and not exclusive) are met.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f90e5b5b
  6. 22 5月, 2011 1 次提交
  7. 21 5月, 2011 1 次提交
    • S
      GFS2: Wipe directory hash table metadata when deallocating a directory · 6d3117b4
      Steven Whitehouse 提交于
      The deallocation code for directories in GFS2 is largely divided into
      two parts. The first part deallocates any directory leaf blocks and
      marks the directory as being a regular file when that is complete. The
      second stage was identical to deallocating regular files.
      
      Regular files have their data blocks in a different
      address space to directories, and thus what would have been normal data
      blocks in a regular file (the hash table in a GFS2 directory) were
      deallocated correctly. However, a reference to these blocks was left in the
      journal (assuming of course that some previous activity had resulted in
      those blocks being in the journal or ail list).
      
      This patch uses the i_depth as a test of whether the inode is an
      exhash directory (we cannot test the inode type as that has already
      been changed to a regular file at this stage in deallocation)
      
      The original issue was reported by Chris Hertel as an issue he encountered
      running bonnie++
      Reported-by: NChristopher R. Hertel <crh@samba.org>
      Cc: Abhijith Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6d3117b4
  8. 13 5月, 2011 3 次提交
  9. 10 5月, 2011 3 次提交
  10. 09 5月, 2011 7 次提交
  11. 05 5月, 2011 2 次提交
    • S
      GFS2: Don't use a try lock when promoting to a higher mode · 588da3b3
      Steven Whitehouse 提交于
      Previously we marked all locks being promoted to a higher mode
      with the try flag to avoid any potential deadlocks issues. The
      DLM is able to detect these and report them in way that GFS2 can
      deal with them correctly. So we can just request the required mode
      and wait for a response without needing to perform this check.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      588da3b3
    • S
      GFS2: Double check link count under glock · d192a8e5
      Steven Whitehouse 提交于
      To avoid any possible races relating to the link count, we need to
      recheck it under the inode's glock in all cases where it matters.
      Also to ensure we never get any nasty surprises, this patch also
      ensures that once the link count has hit zero it can never be
      elevated by rereading in data from disk.
      
      The only place we cannot provide a proper solution is in rename
      in the case where we are removing a target inode and we discover
      that the target inode has been already unlinked on another node.
      The race window is very small, and we return EAGAIN in this case
      to indicate what has happened. The proper solution would be to move
      the lookup parts of rename from the vfs into library calls which
      the fs could call directly, but that is potentially a very big job
      and this fix should cover most cases for now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d192a8e5
  12. 03 5月, 2011 3 次提交
  13. 26 4月, 2011 1 次提交
  14. 20 4月, 2011 10 次提交
    • S
      GFS2: Add an AIL writeback tracepoint · c83ae9ca
      Steven Whitehouse 提交于
      Add a tracepoint for monitoring writeback of the AIL.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c83ae9ca
    • S
      GFS2: Make writeback more responsive to system conditions · 4667a0ec
      Steven Whitehouse 提交于
      This patch adds writeback_control to writing back the AIL
      list. This means that we can then take advantage of the
      information we get in ->write_inode() in order to set off
      some pre-emptive writeback.
      
      In addition, the AIL code is cleaned up a bit to make it
      a bit simpler to understand.
      
      There is still more which can usefully be done in this area,
      but this is a good start at least.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4667a0ec
    • S
      GFS2: Optimise glock lru and end of life inodes · f42ab085
      Steven Whitehouse 提交于
      The GLF_LRU flag introduced in the previous patch can be
      used to check if a glock is on the lru list when a new
      holder is queued and if so remove it, without having first
      to get the lru_lock.
      
      The main purpose of this patch however is to optimise the
      glocks left over when an inode at end of life is being
      evicted. Previously such glocks were left with the GLF_LFLUSH
      flag set, so that when reclaimed, each one required a log flush.
      This patch resets the GLF_LFLUSH flag when there is nothing
      left to flush thus preventing later log flushes as glocks are
      reused or demoted.
      
      In order to do this, we need to keep track of the number of
      revokes which are outstanding, and also to clear the GLF_LFLUSH
      bit after a log commit when only revokes have been processed.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f42ab085
    • S
      GFS2: Improve tracing support (adds two flags) · 627c10b7
      Steven Whitehouse 提交于
      This adds support for two new flags. One keeps track of whether
      the glock is on the LRU list or not. The other isn't really a
      flag as such, but an indication of whether the glock has an
      attached object or not. This indication is reported without
      any locking, which is ok since we do not dereference the object
      pointer but merely report whether it is NULL or not.
      
      Also, this fixes one place where a tracepoint was missing, which
      was at the point we remove deallocated blocks from the journal.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      627c10b7
    • S
      GFS2: Clean up fsync() · dba898b0
      Steven Whitehouse 提交于
      This patch is designed to clean up GFS2's fsync
      implementation and ensure that it really does get everything on
      disk. Since ->write_inode() has been updated, we can call that
      via the vfs library function sync_inode_metadata() and the only
      remaining thing that has to be done is to ensure that we get
      any revoke records in the log after the inode has been written back.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      dba898b0
    • S
      GFS2: Remove unused macro · efc1a9c2
      Steven Whitehouse 提交于
      The buffer_in_io() macro has been unused for some time,
      so remove it.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      efc1a9c2
    • S
      GFS2: Alter point of entry to glock lru list for glocks with an address_space · 29687a2a
      Steven Whitehouse 提交于
      Rather than allowing the glocks to be scheduled for possible
      reclaim as soon as they have exited the journal, this patch
      delays their entry to the list until the glocks in question
      are no longer in use.
      
      This means that we will rely on the vm for writeback of all
      dirty data and metadata from now on. When glocks are added
      to the lru list they should be freeable much faster since all
      the I/O required to free them should have already been completed.
      
      This should lead to much better I/O patterns under low memory
      conditions.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      29687a2a
    • S
      GFS2: Use filemap_fdatawrite() to write back the AIL · 5ac048bb
      Steven Whitehouse 提交于
      In order to ensure that the mapping stats (and thus the bdi) are correctly
      updated, this patch changes the AIL writeback to use the filemap_datawrite
      function. This helps prevent stalls in balance_dirty_pages() due to
      large amounts of dirty metadata when there is little or no dirty data
      around.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5ac048bb
    • S
      GFS2: Make ->write_inode() really write · 1027efaa
      Steven Whitehouse 提交于
      The GFS2 ->write_inode function should be more aggressive at writing
      back to the filesystem. This adopts the XFS system of returning
      -EAGAIN when the writeback has not been completely done. Also, we
      now kick off in-place writeback when called with WB_SYNC_NONE,
      but we only wait for it and flush the log when WB_SYNC_ALL is
      requested.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1027efaa
    • B
      GFS2: move function foreach_leaf to gfs2_dir_exhash_dealloc · 556bb179
      Bob Peterson 提交于
      The previous patches made function gfs2_dir_exhash_dealloc do nothing
      but call function foreach_leaf.  This patch simplifies the code by
      moving the entire function foreach_leaf into gfs2_dir_exhash_dealloc.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      556bb179