1. 21 10月, 2011 7 次提交
    • S
      GFS2: Use ->dirty_inode() · ab9bbda0
      Steven Whitehouse 提交于
      The aim of this patch is to use the newly enhanced ->dirty_inode()
      super block operation to deal with atime updates, rather than
      piggy backing that code into ->write_inode() as is currently
      done.
      
      The net result is a simplification of the code in various places
      and a reduction of the number of gfs2_dinode_out() calls since
      this is now implied by ->dirty_inode().
      
      Some of the mark_inode_dirty() calls have been moved under glocks
      in order to take advantage of then being able to avoid locking in
      ->dirty_inode() when we already have suitable locks.
      
      One consequence is that generic_write_end() now correctly deals
      with file size updates, so that we do not need a separate check
      for that afterwards. This also, indirectly, means that fdatasync
      should work correctly on GFS2 - the current code always syncs the
      metadata whether it needs to or not.
      
      Has survived testing with postmark (with and without atime) and
      also fsx.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ab9bbda0
    • S
      GFS2: Fix bug trap and journaled data fsync · f1818529
      Steven Whitehouse 提交于
      Journaled data requires that a complete flush of all dirty data for
      the file is done, in order that the ail flush which comes after
      will succeed.
      
      Also the recently enhanced bug trap can trigger falsely in case
      an ail flush from fsync races with a page read. This updates the
      bug trap such that it will ignore buffers which are locked and
      only trigger on dirty and/or pinned buffers when the ail flush
      is run from fsync. The original bug trap is retained when ail
      flush is run from ->go_sync()
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f1818529
    • S
      GFS2: Fix inode allocation error path · 40ac218f
      Steven Whitehouse 提交于
      If we have got far enough through the inode allocation code
      path that an inode has already been allocated, then we must
      call iput to dispose of it, if an error occurs during a
      later part of the process. This will always be the final iput
      since there will be no other references to the inode.
      
      Unlike when the inode has been unlinked, its block state will
      be GFS2_BLKST_INODE rather than GFS2_BLKST_UNLINKED so we need
      to skip the test in ->evict_inode() for this one case in order
      to ensure that it will be deallocated correctly. This patch adds
      a new flag in order to ensure that this will happen correctly.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      40ac218f
    • S
      GFS2: Make atime checks more efficient · 1d4ec642
      Steven Whitehouse 提交于
      We do not need to start a transaction unless the atime
      check has proved positive. Also if we are going to flush
      the complete ail list anyway, we might as well skip the
      writeback for this specific inode's metadata, since that
      will be done as part of the ail writeback process in an
      order offering potentially more efficient I/O.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1d4ec642
    • S
      GFS2: Fix bug-trap in ail flush code · 75549186
      Steven Whitehouse 提交于
      The assert was being tested under the wrong lock, a
      legacy of the original code. Also, if it does trigger,
      the resulting information was not always a lot of help.
      
      This moves the patch under the correct lock and also
      prints out more useful information in tacking down the
      source of the problem.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      75549186
    • S
      GFS2: Split data write & wait in fsync · 2f0264d5
      Steven Whitehouse 提交于
      Now that the data writing is part of fsync proper, we can split
      the waiting part out and do it later on. This reduces the
      number of waits that we do during fsync on average.
      
      There is also no need to take the i_mutex unless we are flushing
      metadata to disk, so we can move that to within the metadata
      flushing code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2f0264d5
    • S
      GFS2: Clean up dir hash table reading · 4c28d338
      Steven Whitehouse 提交于
      Since there is now only a single caller to gfs2_dir_read_data()
      and it has a number of constant arguments, we can factor
      those out. Also some tests relating to the inode size were
      being done twice.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4c28d338
  2. 23 8月, 2011 2 次提交
  3. 01 8月, 2011 2 次提交
  4. 27 7月, 2011 1 次提交
  5. 26 7月, 2011 5 次提交
  6. 21 7月, 2011 3 次提交
  7. 20 7月, 2011 5 次提交
  8. 15 7月, 2011 4 次提交
  9. 14 7月, 2011 1 次提交
    • S
      GFS2: Resolve inode eviction and ail list interaction bug · 380f7c65
      Steven Whitehouse 提交于
      This patch contains a few misc fixes which resolve a recently
      reported issue. This patch has been a real team effort and has
      received a lot of testing.
      
      The first issue is that the ail lock needs to be held over a few
      more operations. The lock thats added into gfs2_releasepage() may
      possibly be a candidate for replacing with RCU at some future
      point, but at this stage we've gone for the obvious fix.
      
      The second issue is that gfs2_write_inode() can end up calling
      a glock recursively when called from gfs2_evict_inode() via the
      syncing code, so it needs a guard added.
      
      The third issue is that we either need to not truncate the metadata
      pages of inodes which have zero link count, but which we cannot
      deallocate due to them still being in use by other nodes, or we need
      to ensure that those pages have all made it through the journal and
      ail lists first. This patch takes the former approach, but the
      latter has also been tested and there is nothing to choose between
      them performance-wise. So again, we could revise that decision
      in the future.
      
      Also, the inode eviction process is now better documented.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Tested-by: NBob Peterson <rpeterso@redhat.com>
      Tested-by: NAbhijith Das <adas@redhat.com>
      Reported-by: NBarry J. Marson <bmarson@redhat.com>
      Reported-by: NDavid Teigland <teigland@redhat.com>
      380f7c65
  10. 12 7月, 2011 2 次提交
    • S
      GFS2: Fix race during filesystem mount · 3942ae53
      Steven Whitehouse 提交于
      There is a potential race during filesystem mounting which has recently
      been reported. It occurs when the userland gfs_controld is able to
      process requests fast enough that it tries to use the sysfs interface
      before the lock module is properly initialised. This is a pretty
      unusual case as normally the lock module initialisation is very quick
      compared with gfs_controld.
      
      This patch adds an interruptible completion which is used to ensure that
      userland will wait for the initialisation of the lock module to
      complete.
      
      There are other potential solutions to this problem, but this is the
      quickest at this stage and has been tested both with and without
      mount.gfs2 present in the system.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Reported-by: NDavid Booher <dbooher@adams.net>
      3942ae53
    • B
      GFS2: force a log flush when invalidating the rindex glock · 1ce53368
      Benjamin Marzinski 提交于
      Right now, there is nothing that forces the log to get flushed when a node
      drops its rindex glock so that another node can grow the filesystem. If the
      log doesn't get flushed, GFS2 can corrupt the sd_log_le_rg list in the
      following way.
      
      A node puts an rgd on the list in rg_lo_add(), and then the rindex glock is
      dropped so the other node can grow the filesystem. When the node reacquires the
      rindex glock, that rgd gets deleted in clear_rgrpdi() before ever being
      removed from the list by gfs2_log_flush().
      
      This code simply forces a log flush when the rindex glock is invalidated,
      solving the problem.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1ce53368
  11. 26 5月, 2011 1 次提交
    • M
      gfs2: Drop __TIME__ usage · 8d2c50e3
      Michal Marek 提交于
      The kernel already prints its build timestamp during boot, no need to
      repeat it in random drivers and produce different object files each
      time.
      
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: cluster-devel@redhat.com
      Signed-off-by: NMichal Marek <mmarek@suse.cz>
      8d2c50e3
  12. 25 5月, 2011 2 次提交
    • Y
      vmscan: change shrinker API by passing shrink_control struct · 1495f230
      Ying Han 提交于
      Change each shrinker's API by consolidating the existing parameters into
      shrink_control struct.  This will simplify any further features added w/o
      touching each file of shrinker.
      
      [akpm@linux-foundation.org: fix build]
      [akpm@linux-foundation.org: fix warning]
      [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
      [akpm@linux-foundation.org: fix xfs warning]
      [akpm@linux-foundation.org: update gfs2]
      Signed-off-by: NYing Han <yinghan@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Acked-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1495f230
    • B
      GFS2: Processes waiting on inode glock that no processes are holding · f90e5b5b
      Bob Peterson 提交于
      This patch fixes a race in the GFS2 glock state machine that may
      result in lockups.  The symptom is that all nodes but one will
      hang, waiting for a particular glock.  All the holder records
      will have the "W" (Waiting) bit set.  The other node will
      typically have the glock stuck in Exclusive mode (EX) with no
      holder records, but the dinode will be cached.  In other words,
      an entry with "I:" will appear in the glock dump for that glock,
      but nothing else.
      
      The race has to do with the glock "Pending Demote" bit, which
      can be set, then immediately reset, thus losing the fact that
      another node needs the glock.  The sequence of events is:
      
      1. Something schedules the glock workqueue (e.g. glock request from fs)
      2. The glock workqueue gets to the point between the test of the reply pending
      bit and the spin lock:
      
              if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) {
                      finish_xmote(gl, gl->gl_reply);
                      drop_ref = 1;
              }
              down_read(&gfs2_umount_flush_sem);         <---- i.e. here
              spin_lock(&gl->gl_spin);
      
      3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and
                  (b) the demote request which sets GLF_PENDING_DEMOTE
      
      4. The following test is executed:
      
              if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
                  gl->gl_state != LM_ST_UNLOCKED &&
                  gl->gl_demote_state != LM_ST_EXCLUSIVE) {
      
      This resets the pending demote flag, and gl->gl_demote_state is not equal to
      exclusive, however because the reply from the dlm arrived after we checked for
      the GLF_REPLY_PENDING flag, gl->gl_state is still equal to unlocked, so
      although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the
      GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG.
      
      The patch closes the timing window by only transitioning the
      "Pending demote" bit to the "demote" flag once we know the
      other conditions (not unlocked and not exclusive) are met.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f90e5b5b
  13. 22 5月, 2011 1 次提交
  14. 21 5月, 2011 1 次提交
    • S
      GFS2: Wipe directory hash table metadata when deallocating a directory · 6d3117b4
      Steven Whitehouse 提交于
      The deallocation code for directories in GFS2 is largely divided into
      two parts. The first part deallocates any directory leaf blocks and
      marks the directory as being a regular file when that is complete. The
      second stage was identical to deallocating regular files.
      
      Regular files have their data blocks in a different
      address space to directories, and thus what would have been normal data
      blocks in a regular file (the hash table in a GFS2 directory) were
      deallocated correctly. However, a reference to these blocks was left in the
      journal (assuming of course that some previous activity had resulted in
      those blocks being in the journal or ail list).
      
      This patch uses the i_depth as a test of whether the inode is an
      exhash directory (we cannot test the inode type as that has already
      been changed to a regular file at this stage in deallocation)
      
      The original issue was reported by Chris Hertel as an issue he encountered
      running bonnie++
      Reported-by: NChristopher R. Hertel <crh@samba.org>
      Cc: Abhijith Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6d3117b4
  15. 13 5月, 2011 3 次提交