1. 14 7月, 2011 1 次提交
    • S
      GFS2: Resolve inode eviction and ail list interaction bug · 380f7c65
      Steven Whitehouse 提交于
      This patch contains a few misc fixes which resolve a recently
      reported issue. This patch has been a real team effort and has
      received a lot of testing.
      
      The first issue is that the ail lock needs to be held over a few
      more operations. The lock thats added into gfs2_releasepage() may
      possibly be a candidate for replacing with RCU at some future
      point, but at this stage we've gone for the obvious fix.
      
      The second issue is that gfs2_write_inode() can end up calling
      a glock recursively when called from gfs2_evict_inode() via the
      syncing code, so it needs a guard added.
      
      The third issue is that we either need to not truncate the metadata
      pages of inodes which have zero link count, but which we cannot
      deallocate due to them still being in use by other nodes, or we need
      to ensure that those pages have all made it through the journal and
      ail lists first. This patch takes the former approach, but the
      latter has also been tested and there is nothing to choose between
      them performance-wise. So again, we could revise that decision
      in the future.
      
      Also, the inode eviction process is now better documented.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Tested-by: NBob Peterson <rpeterso@redhat.com>
      Tested-by: NAbhijith Das <adas@redhat.com>
      Reported-by: NBarry J. Marson <bmarson@redhat.com>
      Reported-by: NDavid Teigland <teigland@redhat.com>
      380f7c65
  2. 12 7月, 2011 2 次提交
    • S
      GFS2: Fix race during filesystem mount · 3942ae53
      Steven Whitehouse 提交于
      There is a potential race during filesystem mounting which has recently
      been reported. It occurs when the userland gfs_controld is able to
      process requests fast enough that it tries to use the sysfs interface
      before the lock module is properly initialised. This is a pretty
      unusual case as normally the lock module initialisation is very quick
      compared with gfs_controld.
      
      This patch adds an interruptible completion which is used to ensure that
      userland will wait for the initialisation of the lock module to
      complete.
      
      There are other potential solutions to this problem, but this is the
      quickest at this stage and has been tested both with and without
      mount.gfs2 present in the system.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Reported-by: NDavid Booher <dbooher@adams.net>
      3942ae53
    • B
      GFS2: force a log flush when invalidating the rindex glock · 1ce53368
      Benjamin Marzinski 提交于
      Right now, there is nothing that forces the log to get flushed when a node
      drops its rindex glock so that another node can grow the filesystem. If the
      log doesn't get flushed, GFS2 can corrupt the sd_log_le_rg list in the
      following way.
      
      A node puts an rgd on the list in rg_lo_add(), and then the rindex glock is
      dropped so the other node can grow the filesystem. When the node reacquires the
      rindex glock, that rgd gets deleted in clear_rgrpdi() before ever being
      removed from the list by gfs2_log_flush().
      
      This code simply forces a log flush when the rindex glock is invalidated,
      solving the problem.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1ce53368
  3. 26 5月, 2011 1 次提交
    • M
      gfs2: Drop __TIME__ usage · 8d2c50e3
      Michal Marek 提交于
      The kernel already prints its build timestamp during boot, no need to
      repeat it in random drivers and produce different object files each
      time.
      
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: cluster-devel@redhat.com
      Signed-off-by: NMichal Marek <mmarek@suse.cz>
      8d2c50e3
  4. 25 5月, 2011 2 次提交
    • Y
      vmscan: change shrinker API by passing shrink_control struct · 1495f230
      Ying Han 提交于
      Change each shrinker's API by consolidating the existing parameters into
      shrink_control struct.  This will simplify any further features added w/o
      touching each file of shrinker.
      
      [akpm@linux-foundation.org: fix build]
      [akpm@linux-foundation.org: fix warning]
      [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
      [akpm@linux-foundation.org: fix xfs warning]
      [akpm@linux-foundation.org: update gfs2]
      Signed-off-by: NYing Han <yinghan@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Acked-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1495f230
    • B
      GFS2: Processes waiting on inode glock that no processes are holding · f90e5b5b
      Bob Peterson 提交于
      This patch fixes a race in the GFS2 glock state machine that may
      result in lockups.  The symptom is that all nodes but one will
      hang, waiting for a particular glock.  All the holder records
      will have the "W" (Waiting) bit set.  The other node will
      typically have the glock stuck in Exclusive mode (EX) with no
      holder records, but the dinode will be cached.  In other words,
      an entry with "I:" will appear in the glock dump for that glock,
      but nothing else.
      
      The race has to do with the glock "Pending Demote" bit, which
      can be set, then immediately reset, thus losing the fact that
      another node needs the glock.  The sequence of events is:
      
      1. Something schedules the glock workqueue (e.g. glock request from fs)
      2. The glock workqueue gets to the point between the test of the reply pending
      bit and the spin lock:
      
              if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) {
                      finish_xmote(gl, gl->gl_reply);
                      drop_ref = 1;
              }
              down_read(&gfs2_umount_flush_sem);         <---- i.e. here
              spin_lock(&gl->gl_spin);
      
      3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and
                  (b) the demote request which sets GLF_PENDING_DEMOTE
      
      4. The following test is executed:
      
              if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
                  gl->gl_state != LM_ST_UNLOCKED &&
                  gl->gl_demote_state != LM_ST_EXCLUSIVE) {
      
      This resets the pending demote flag, and gl->gl_demote_state is not equal to
      exclusive, however because the reply from the dlm arrived after we checked for
      the GLF_REPLY_PENDING flag, gl->gl_state is still equal to unlocked, so
      although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the
      GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG.
      
      The patch closes the timing window by only transitioning the
      "Pending demote" bit to the "demote" flag once we know the
      other conditions (not unlocked and not exclusive) are met.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f90e5b5b
  5. 22 5月, 2011 1 次提交
  6. 21 5月, 2011 1 次提交
    • S
      GFS2: Wipe directory hash table metadata when deallocating a directory · 6d3117b4
      Steven Whitehouse 提交于
      The deallocation code for directories in GFS2 is largely divided into
      two parts. The first part deallocates any directory leaf blocks and
      marks the directory as being a regular file when that is complete. The
      second stage was identical to deallocating regular files.
      
      Regular files have their data blocks in a different
      address space to directories, and thus what would have been normal data
      blocks in a regular file (the hash table in a GFS2 directory) were
      deallocated correctly. However, a reference to these blocks was left in the
      journal (assuming of course that some previous activity had resulted in
      those blocks being in the journal or ail list).
      
      This patch uses the i_depth as a test of whether the inode is an
      exhash directory (we cannot test the inode type as that has already
      been changed to a regular file at this stage in deallocation)
      
      The original issue was reported by Chris Hertel as an issue he encountered
      running bonnie++
      Reported-by: NChristopher R. Hertel <crh@samba.org>
      Cc: Abhijith Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6d3117b4
  7. 13 5月, 2011 3 次提交
  8. 10 5月, 2011 3 次提交
  9. 09 5月, 2011 7 次提交
  10. 05 5月, 2011 2 次提交
    • S
      GFS2: Don't use a try lock when promoting to a higher mode · 588da3b3
      Steven Whitehouse 提交于
      Previously we marked all locks being promoted to a higher mode
      with the try flag to avoid any potential deadlocks issues. The
      DLM is able to detect these and report them in way that GFS2 can
      deal with them correctly. So we can just request the required mode
      and wait for a response without needing to perform this check.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      588da3b3
    • S
      GFS2: Double check link count under glock · d192a8e5
      Steven Whitehouse 提交于
      To avoid any possible races relating to the link count, we need to
      recheck it under the inode's glock in all cases where it matters.
      Also to ensure we never get any nasty surprises, this patch also
      ensures that once the link count has hit zero it can never be
      elevated by rereading in data from disk.
      
      The only place we cannot provide a proper solution is in rename
      in the case where we are removing a target inode and we discover
      that the target inode has been already unlinked on another node.
      The race window is very small, and we return EAGAIN in this case
      to indicate what has happened. The proper solution would be to move
      the lookup parts of rename from the vfs into library calls which
      the fs could call directly, but that is potentially a very big job
      and this fix should cover most cases for now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d192a8e5
  11. 03 5月, 2011 3 次提交
  12. 26 4月, 2011 1 次提交
  13. 20 4月, 2011 13 次提交