1. 13 2月, 2013 2 次提交
  2. 29 1月, 2013 3 次提交
    • S
      GFS2: Use ->writepages for ordered writes · 45138990
      Steven Whitehouse 提交于
      Instead of using a list of buffers to write ahead of the journal
      flush, this now uses a list of inodes and calls ->writepages
      via filemap_fdatawrite() in order to achieve the same thing. For
      most use cases this results in a shorter ordered write list,
      as well as much larger i/os being issued.
      
      The ordered write list is sorted by inode number before writing
      in order to retain the disk block ordering between inodes as
      per the previous code.
      
      The previous ordered write code used to conflict in its assumptions
      about how to write out the disk blocks with mpage_writepages()
      so that with this updated version we can also use mpage_writepages()
      for GFS2's ordered write, writepages implementation. So we will
      also send larger i/os from writeback too.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      45138990
    • S
      GFS2: Clean up freeze code · d564053f
      Steven Whitehouse 提交于
      The freeze code has not been looked at a lot recently. Upstream has
      moved on, and this is an attempt to catch us back up again. There
      is a vfs level interface for the freeze code which can be called
      from our (obsolete, but kept for backward compatibility purposes)
      sysfs freeze interface. This means freezing this way vs. doing it
      from the ioctl should now work in identical fashion.
      
      As a result of this, the freeze function is only called once
      and we can drop our own special purpose code for counting the
      number of freezes.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d564053f
    • S
      GFS2: Split gfs2_trans_add_bh() into two · 350a9b0a
      Steven Whitehouse 提交于
      There is little common content in gfs2_trans_add_bh() between the data
      and meta classes by the time that the functions which it calls are
      taken into account. The intent here is to split this into two
      separate functions. Stage one is to introduce gfs2_trans_add_data()
      and gfs2_trans_add_meta() and update the callers accordingly.
      
      Later patches will then pull in the content of gfs2_trans_add_bh()
      and its dependent functions in order to clean up the code in this
      area.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      350a9b0a
  3. 07 11月, 2012 1 次提交
    • B
      GFS2: Don't call file_accessed() with a shared glock · 3d162688
      Benjamin Marzinski 提交于
      file_accessed() was being called by gfs2_mmap() with a shared glock. If it
      needed to update the atime, it was crashing because it dirtied the inode in
      gfs2_dirty_inode() without holding an exclusive lock. gfs2_dirty_inode()
      checked if the caller was already holding a glock, but it didn't make sure that
      the glock was in the exclusive state. Now, instead of calling file_accessed()
      while holding the shared lock in gfs2_mmap(), file_accessed() is called after
      grabbing and releasing the glock to update the inode.  If file_accessed() needs
      to update the atime, it will grab an exclusive lock in gfs2_dirty_inode().
      
      gfs2_dirty_inode() now also checks to make sure that if the calling process has
      already locked the glock, it has an exclusive lock.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3d162688
  4. 24 9月, 2012 3 次提交
    • B
      GFS2: Write out dirty inode metadata in delayed deletes · 2216db70
      Benjamin Marzinski 提交于
      If a dirty GFS2 inode was being deleted but was in use by another node, its
      metadata was not getting written out before GFS2 checked for dirty buffers in
      gfs2_ail_flush().  GFS2 was relying on inode_go_sync() to write out the
      metadata when the other node tried to free the file, but it failed the error
      check before it got that far. This patch writes out the metadata before calling
      gfs2_ail_flush()
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2216db70
    • S
      GFS2: Fix ->show_options() for statfs slow · 2b9731e8
      Steven Whitehouse 提交于
      The ->show_options() function for GFS2 was not correctly displaying
      the value when statfs slow in in use.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Reported-by: NMilos Jakubicek <xjakub@fi.muni.cz>
      2b9731e8
    • S
      GFS2: Add structure to contain rgrp, bitmap, offset tuple · 4a993fb1
      Steven Whitehouse 提交于
      This patch introduces a new structure, gfs2_rbm, which is a
      tuple of a resource group, a bitmap within the resource group
      and an offset within that bitmap. This is designed to make
      manipulating these sets of variables easier. There is also a
      new helper function which converts this representation back
      to a disk block address.
      
      In addition, the rbtree nodes which are used for the reservations
      were not being correctly initialised, which is now fixed. Also,
      the tracing was not passing through the inode where it should
      have been. That is mostly fixed aside from one corner case. This
      needs to be revisited since there can also be a NULL rgrp in
      some cases which results in the device being incorrect in the
      trace.
      
      This is intended to be the first step towards cleaning up some
      of the allocation code, and some further bug fixes.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4a993fb1
  5. 21 8月, 2012 1 次提交
    • T
      workqueue: deprecate flush[_delayed]_work_sync() · 43829731
      Tejun Heo 提交于
      flush[_delayed]_work_sync() are now spurious.  Mark them deprecated
      and convert all users to flush[_delayed]_work().
      
      If you're cc'd and wondering what's going on: Now all workqueues are
      non-reentrant and the regular flushes guarantee that the work item is
      not pending or running on any CPU on return, so there's no reason to
      use the sync flushes at all and they're going away.
      
      This patch doesn't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Mattia Dongili <malattia@linux.it>
      Cc: Kent Yoder <key@linux.vnet.ibm.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Cc: Bryan Wu <bryan.wu@canonical.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-wireless@vger.kernel.org
      Cc: Anton Vorontsov <cbou@mail.ru>
      Cc: Sangbeom Kim <sbkim73@samsung.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Avi Kivity <avi@redhat.com> 
      43829731
  6. 23 7月, 2012 2 次提交
  7. 19 7月, 2012 1 次提交
    • B
      GFS2: Reduce file fragmentation · 8e2e0047
      Bob Peterson 提交于
      This patch reduces GFS2 file fragmentation by pre-reserving blocks. The
      resulting improved on disk layout greatly speeds up operations in cases
      which would have resulted in interlaced allocation of blocks previously.
      A typical example of this is 10 parallel dd processes, each writing to a
      file in a common dirctory.
      
      The implementation uses an rbtree of reservations attached to each
      resource group (and each inode).
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8e2e0047
  8. 08 6月, 2012 1 次提交
    • B
      GFS2: Use lvbs for storing rgrp information with mount option · 90306c41
      Benjamin Marzinski 提交于
      Instead of reading in the resource groups when gfs2 is checking
      for free space to allocate from, gfs2 can store the necessary infromation
      in the resource group's lvb.  Also, instead of searching for unlinked
      inodes in every resource group that's checked for free space, gfs2 can
      store the number of unlinked but inodes in the lvb, and only check for
      unlinked inodes if it will find some.
      
      The first time a resource group is locked, the lvb must initialized.
      Since this involves counting the unlinked inodes in the resource group,
      this takes a little extra time.  But after that, if the resource group
      is locked with GL_SKIP, the buffer head won't be read in unless it's
      actually needed.
      
      Enabling the resource groups lvbs is done via the rgrplvb mount option.  If
      this option isn't set, the lvbs will still be set and updated, but they won't
      be verfied or used by the filesystem.  To safely turn on this option, all of
      the nodes mounting the filesystem must be running code with this patch, and
      the filesystem must have been completely unmounted since they were updated.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      90306c41
  9. 06 6月, 2012 2 次提交
  10. 06 5月, 2012 1 次提交
  11. 07 3月, 2012 1 次提交
  12. 29 2月, 2012 1 次提交
    • S
      GFS2: FITRIM ioctl support · 66fc061b
      Steven Whitehouse 提交于
      The FITRIM ioctl provides an alternative way to send discard requests to
      the underlying device. Using the discard mount option results in every
      freed block generating a discard request to the block device. This can
      be slow, since many block devices can only process discard requests of
      larger sizes, and also such operations can be time consuming.
      
      Rather than using the discard mount option, FITRIM allows a sweep of the
      filesystem on an occasional basis, and also to optionally avoid sending
      down discard requests for smaller regions.
      
      In GFS2 FITRIM will work at resource group granularity. There is a flag
      for each resource group which keeps track of which resource groups have
      been trimmed. This flag is reset whenever a deallocation occurs in the
      resource group, and set whenever a successful FITRIM of that resource
      group has taken place. This helps to reduce repeated discard requests
      for the same block ranges, again improving performance.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      66fc061b
  13. 07 1月, 2012 1 次提交
  14. 04 1月, 2012 1 次提交
    • A
      vfs: fix the stupidity with i_dentry in inode destructors · 6b520e05
      Al Viro 提交于
      Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
      it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
      the cost of taking it into inode_init_always() will be negligible for pipes
      and sockets and negative for everything else.  Not to mention the removal of
      boilerplate code from ->destroy_inode() instances...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6b520e05
  15. 22 11月, 2011 1 次提交
  16. 21 10月, 2011 6 次提交
    • S
      GFS2: Fix AIL flush issue during fsync · b5b24d7a
      Steven Whitehouse 提交于
      Unfortunately, it is not enough to just ignore locked buffers during
      the AIL flush from fsync. We need to be able to ignore all buffers
      which are locked, dirty or pinned at this stage as they might have
      been added subsequent to the log flush earlier in the fsync function.
      
      In addition, this means that we no longer need to rely on i_mutex to
      keep out writes during fsync, so we can, as a side-effect, remove
      that protection too.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Tested-By: NAbhijith Das <adas@redhat.com>
      b5b24d7a
    • S
      GFS2: Cache the most recently used resource group in the inode · 54335b1f
      Steven Whitehouse 提交于
      This means that after the initial allocation for any inode, the
      last used resource group is cached in the inode for future use.
      This drastically reduces the number of lookups of resource
      groups in the common case, and this the contention on that
      data structure.
      
      The allocation algorithm is the same as previously, except that we
      always check to see if the goal block is within the cached rgrp
      first before going to the rbtree to look one up.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      54335b1f
    • S
      GFS2: Make resource groups "append only" during life of fs · 8339ee54
      Steven Whitehouse 提交于
      Since we have ruled out supporting online filesystem shrink,
      it is possible to make the resource group list append only
      during the life of a super block. This gives several benefits:
      
      Firstly, we only need to read new rindex elements as they are added
      rather than needing to reread the whole rindex file each time one
      element is added.
      
      Secondly, the rindex glock can be held for much shorter periods of
      time, and is completely removed from the fast path for allocations.
      The lock is taken in shared mode only when updating the resource
      groups when the first allocation occurs, and after a grow has
      taken place.
      
      Thirdly, this results in a reduction in code size, and everything
      gets a lot simpler to understand in this area.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8339ee54
    • S
      GFS2: Use ->dirty_inode() · ab9bbda0
      Steven Whitehouse 提交于
      The aim of this patch is to use the newly enhanced ->dirty_inode()
      super block operation to deal with atime updates, rather than
      piggy backing that code into ->write_inode() as is currently
      done.
      
      The net result is a simplification of the code in various places
      and a reduction of the number of gfs2_dinode_out() calls since
      this is now implied by ->dirty_inode().
      
      Some of the mark_inode_dirty() calls have been moved under glocks
      in order to take advantage of then being able to avoid locking in
      ->dirty_inode() when we already have suitable locks.
      
      One consequence is that generic_write_end() now correctly deals
      with file size updates, so that we do not need a separate check
      for that afterwards. This also, indirectly, means that fdatasync
      should work correctly on GFS2 - the current code always syncs the
      metadata whether it needs to or not.
      
      Has survived testing with postmark (with and without atime) and
      also fsx.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ab9bbda0
    • S
      GFS2: Fix inode allocation error path · 40ac218f
      Steven Whitehouse 提交于
      If we have got far enough through the inode allocation code
      path that an inode has already been allocated, then we must
      call iput to dispose of it, if an error occurs during a
      later part of the process. This will always be the final iput
      since there will be no other references to the inode.
      
      Unlike when the inode has been unlinked, its block state will
      be GFS2_BLKST_INODE rather than GFS2_BLKST_UNLINKED so we need
      to skip the test in ->evict_inode() for this one case in order
      to ensure that it will be deallocated correctly. This patch adds
      a new flag in order to ensure that this will happen correctly.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      40ac218f
    • S
      GFS2: Make atime checks more efficient · 1d4ec642
      Steven Whitehouse 提交于
      We do not need to start a transaction unless the atime
      check has proved positive. Also if we are going to flush
      the complete ail list anyway, we might as well skip the
      writeback for this specific inode's metadata, since that
      will be done as part of the ail writeback process in an
      order offering potentially more efficient I/O.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1d4ec642
  17. 15 7月, 2011 1 次提交
    • S
      GFS2: Cache dir hash table in a contiguous buffer · 17d539f0
      Steven Whitehouse 提交于
      This patch adds a cache for the hash table to the directory code
      in order to help simplify the way in which the hash table is
      accessed. This is intended to be a first step towards introducing
      some performance improvements in the directory code.
      
      There are two follow ups that I'm hoping to see fairly shortly. One
      is to simplify the hash table reading code now that we always read the
      complete hash table, whether we want one entry or all of them. The
      other is to introduce readahead on the heads of the hash chains
      which are referred to from the table.
      
      The hash table is a maximum of 128k in size, so it is not worth trying
      to read it in small chunks.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      17d539f0
  18. 14 7月, 2011 1 次提交
    • S
      GFS2: Resolve inode eviction and ail list interaction bug · 380f7c65
      Steven Whitehouse 提交于
      This patch contains a few misc fixes which resolve a recently
      reported issue. This patch has been a real team effort and has
      received a lot of testing.
      
      The first issue is that the ail lock needs to be held over a few
      more operations. The lock thats added into gfs2_releasepage() may
      possibly be a candidate for replacing with RCU at some future
      point, but at this stage we've gone for the obvious fix.
      
      The second issue is that gfs2_write_inode() can end up calling
      a glock recursively when called from gfs2_evict_inode() via the
      syncing code, so it needs a guard added.
      
      The third issue is that we either need to not truncate the metadata
      pages of inodes which have zero link count, but which we cannot
      deallocate due to them still being in use by other nodes, or we need
      to ensure that those pages have all made it through the journal and
      ail lists first. This patch takes the former approach, but the
      latter has also been tested and there is nothing to choose between
      them performance-wise. So again, we could revise that decision
      in the future.
      
      Also, the inode eviction process is now better documented.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Tested-by: NBob Peterson <rpeterso@redhat.com>
      Tested-by: NAbhijith Das <adas@redhat.com>
      Reported-by: NBarry J. Marson <bmarson@redhat.com>
      Reported-by: NDavid Teigland <teigland@redhat.com>
      380f7c65
  19. 09 5月, 2011 2 次提交
  20. 20 4月, 2011 4 次提交
    • S
      GFS2: Make writeback more responsive to system conditions · 4667a0ec
      Steven Whitehouse 提交于
      This patch adds writeback_control to writing back the AIL
      list. This means that we can then take advantage of the
      information we get in ->write_inode() in order to set off
      some pre-emptive writeback.
      
      In addition, the AIL code is cleaned up a bit to make it
      a bit simpler to understand.
      
      There is still more which can usefully be done in this area,
      but this is a good start at least.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4667a0ec
    • S
      GFS2: Optimise glock lru and end of life inodes · f42ab085
      Steven Whitehouse 提交于
      The GLF_LRU flag introduced in the previous patch can be
      used to check if a glock is on the lru list when a new
      holder is queued and if so remove it, without having first
      to get the lru_lock.
      
      The main purpose of this patch however is to optimise the
      glocks left over when an inode at end of life is being
      evicted. Previously such glocks were left with the GLF_LFLUSH
      flag set, so that when reclaimed, each one required a log flush.
      This patch resets the GLF_LFLUSH flag when there is nothing
      left to flush thus preventing later log flushes as glocks are
      reused or demoted.
      
      In order to do this, we need to keep track of the number of
      revokes which are outstanding, and also to clear the GLF_LFLUSH
      bit after a log commit when only revokes have been processed.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f42ab085
    • S
      GFS2: Alter point of entry to glock lru list for glocks with an address_space · 29687a2a
      Steven Whitehouse 提交于
      Rather than allowing the glocks to be scheduled for possible
      reclaim as soon as they have exited the journal, this patch
      delays their entry to the list until the glocks in question
      are no longer in use.
      
      This means that we will rely on the vm for writeback of all
      dirty data and metadata from now on. When glocks are added
      to the lru list they should be freeable much faster since all
      the I/O required to free them should have already been completed.
      
      This should lead to much better I/O patterns under low memory
      conditions.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      29687a2a
    • S
      GFS2: Make ->write_inode() really write · 1027efaa
      Steven Whitehouse 提交于
      The GFS2 ->write_inode function should be more aggressive at writing
      back to the filesystem. This adopts the XFS system of returning
      -EAGAIN when the writeback has not been completely done. Also, we
      now kick off in-place writeback when called with WB_SYNC_NONE,
      but we only wait for it and flush the log when WB_SYNC_ALL is
      requested.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1027efaa
  21. 18 4月, 2011 2 次提交
  22. 31 3月, 2011 1 次提交
  23. 18 1月, 2011 1 次提交
    • B
      GFS2: remove iopen glocks from cache on failed deletes · 23c30108
      Benjamin Marzinski 提交于
      When a file gets deleted on GFS2, if a node can't get an exclusive lock on the
      file's iopen glock, it punts on actually freeing up the space, because another
      node is using the file.  When it does this, it needs to drop the iopen glock
      from its cache so that the other node can get an exclusive lock on it. Now,
      gfs2_delete_inode() sets GL_NOCACHE before dropping the shared lock on the
      iopen glock in preparation for grabbing it in the exclusive state.  Since the
      node needs the glock in the exclusive state, dropping the shared lock from the
      cache doesn't slow down the case where no other nodes are using the file.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      23c30108