1. 23 1月, 2018 2 次提交
  2. 22 1月, 2018 1 次提交
  3. 22 12月, 2017 2 次提交
    • A
      gfs2: Trim the ordered write list in gfs2_ordered_write() · 1f23bc78
      Abhi Das 提交于
      We iterate through the entire ordered writes list in
      gfs2_ordered_write() to write out inodes. It's a good
      place to try and shrink the list by throwing out inodes
      that don't have any pages.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      1f23bc78
    • B
      GFS2: Reduce code redundancy writing log headers · 588bff95
      Bob Peterson 提交于
      Before this patch, there was a lot of code redundancy between functions
      log_write_header (which uses bio) and clean_journal (which uses
      buffer_head). This patch reduces the redundancy to simplify the code
      and make log header writing more consistent. We want more consistency
      and reduced redundancy because we plan to add a bunch of new fields
      to improve performance (by eliminating the local statfs and quota files)
      improve metadata integrity (by adding new crcs and such) and for better
      debugging (by adding new fields to track when and where metadata was
      pushed through the journals.) We don't want to duplicate setting these
      new fields, nor allow for human error in the process.
      
      This reduction in code redundancy is accomplished by introducing a new
      helper function, gfs2_write_log_header which uses bio rather than bh.
      That simplifies recovery function clean_journal() to use the new helper
      function and iomap rather than redundancy and block_map (and eventually
      we can maybe remove block_map). It also reduces our dependency on
      buffer_heads.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      588bff95
  4. 25 8月, 2017 1 次提交
    • B
      GFS2: Withdraw for IO errors writing to the journal or statfs · 942b0cdd
      Bob Peterson 提交于
      Before this patch, if GFS2 encountered IO errors while writing to
      the journal, it would not report the problem, so they would go
      unnoticed, sometimes for many hours. Sometimes this would only be
      noticed later, when recovery tried to do journal replay and failed
      due to invalid metadata at the blocks that resulted in IO errors.
      
      This patch makes GFS2's log daemon check for IO errors. If it
      encounters one, it withdraws from the file system and reports
      why in dmesg. A similar action is taken when IO errors occur when
      writing to the system statfs file.
      
      These errors are also reported back to any callers of fsync, since
      that requires the journal to be flushed. Therefore, any IO errors
      that would previously go unnoticed are now noticed and the file
      system is withdrawn as early as possible, thus preventing further
      file system damage.
      
      Also note that this reintroduces superblock variable sd_log_error,
      which Christoph removed with commit f729b66f.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      942b0cdd
  5. 10 8月, 2017 1 次提交
    • A
      gfs2: forcibly flush ail to relieve memory pressure · b066a4ee
      Abhi Das 提交于
      On systems with low memory, it is possible for gfs2 to infinitely
      loop in balance_dirty_pages() under heavy IO (creating sparse files).
      
      balance_dirty_pages() attempts to write out the dirty pages via
      gfs2_writepages() but none are found because these dirty pages are
      being used by the journaling code in the ail. Normally, the journal
      has an upper threshold which when hit triggers an automatic flush
      of the ail. But this threshold can be higher than the number of
      allowable dirty pages and result in the ail never being flushed.
      
      This patch forces an ail flush when gfs2_writepages() fails to write
      anything. This is a good indication that the ail might be holding
      some dirty pages.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      b066a4ee
  6. 20 6月, 2017 1 次提交
  7. 24 5月, 2017 1 次提交
    • J
      gfs2: Make flush bios explicitely sync · 0f0b9b63
      Jan Kara 提交于
      Commit b685d3d6 "block: treat REQ_FUA and REQ_PREFLUSH as
      synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
      definitions.  generic_make_request_checks() however strips REQ_FUA and
      REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
      write cache and thus write effectively becomes asynchronous which can
      lead to performance regressions
      
      Fix the problem by making sure all bios which are synchronous are
      properly marked with REQ_SYNC.
      
      Fixes: b685d3d6
      CC: Steven Whitehouse <swhiteho@redhat.com>
      CC: cluster-devel@redhat.com
      CC: stable@vger.kernel.org
      Acked-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      0f0b9b63
  8. 27 1月, 2017 1 次提交
  9. 07 1月, 2017 1 次提交
    • B
      GFS2: Wake up io waiters whenever a flush is done · b63f5e84
      Bob Peterson 提交于
      Before this patch, if a process called function gfs2_log_reserve to
      reserve some journal blocks, but the journal not enough blocks were
      free, it would call io_schedule. However, in the log flush daemon,
      it woke up the waiters only if an gfs2_ail_flush was no longer
      required. This resulted in situations where processes would wait
      forever because the number of blocks required was so high that it
      pushed the journal into a perpetual state of flush being required.
      
      This patch changes the logd daemon so that it wakes up io waiters
      every time the log is actually flushed.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      b63f5e84
  10. 06 1月, 2017 1 次提交
  11. 01 11月, 2016 1 次提交
  12. 08 6月, 2016 1 次提交
  13. 15 12月, 2015 1 次提交
    • B
      gfs2: clear journal live bit in gfs2_log_flush · 400ac52e
      Benjamin Marzinski 提交于
      When gfs2 was unmounting filesystems or changing them to read-only it
      was clearing the SDF_JOURNAL_LIVE bit before the final log flush.  This
      caused a race.  If an inode glock got demoted in the gap between
      clearing the bit and the shutdown flush, it would be unable to reserve
      log space to clear out the active items list in inode_go_sync, causing an
      error in inode_go_inval because the glock was still dirty.
      
      To solve this, the SDF_JOURNAL_LIVE bit is now cleared inside the
      shutdown log flush.  This means that, because of the locking on the log
      blocks, either inode_go_sync will be able to reserve space to clean the
      glock before the shutdown flush, or the shutdown flush will clean the
      glock itself, before inode_go_sync fails to reserve the space. Either
      way, the glock will be clean before inode_go_inval.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      400ac52e
  14. 17 11月, 2014 1 次提交
    • B
      GFS2: update freeze code to use freeze/thaw_super on all nodes · 2e60d768
      Benjamin Marzinski 提交于
      The current gfs2 freezing code is considerably more complicated than it
      should be because it doesn't use the vfs freezing code on any node except
      the one that begins the freeze.  This is because it needs to acquire a
      cluster glock before calling the vfs code to prevent a deadlock, and
      without the new freeze_super and thaw_super hooks, that was impossible. To
      deal with the issue, gfs2 had to do some hacky locking tricks to make sure
      that a frozen node couldn't be holding on a lock it needed to do the
      unfreeze ioctl.
      
      This patch makes use of the new hooks to simply the gfs2 locking code. Now,
      all the nodes in the cluster freeze and thaw in exactly the same way. Every
      node in the cluster caches the freeze glock in the shared state.  The new
      freeze_super hook allows the freezing node to grab this freeze glock in
      the exclusive state without first calling the vfs freeze_super function.
      All the nodes in the cluster see this lock change, and call the vfs
      freeze_super function. The vfs locking code guarantees that the nodes can't
      get stuck holding the glocks necessary to unfreeze the system.  To
      unfreeze, the freezing node uses the new thaw_super hook to drop the freeze
      glock. Again, all the nodes notice this, reacquire the glock in shared mode
      and call the vfs thaw_super function.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2e60d768
  15. 14 5月, 2014 1 次提交
    • B
      GFS2: remove transaction glock · 24972557
      Benjamin Marzinski 提交于
      GFS2 has a transaction glock, which must be grabbed for every
      transaction, whose purpose is to deal with freezing the filesystem.
      Aside from this involving a large amount of locking, it is very easy to
      make the current fsfreeze code hang on unfreezing.
      
      This patch rewrites how gfs2 handles freezing the filesystem. The
      transaction glock is removed. In it's place is a freeze glock, which is
      cached (but not held) in a shared state by every node in the cluster
      when the filesystem is mounted. This lock only needs to be grabbed on
      freezing, and actions which need to be safe from freezing, like
      recovery.
      
      When a node wants to freeze the filesystem, it grabs this glock
      exclusively.  When the freeze glock state changes on the nodes (either
      from shared to unlocked, or shared to exclusive), the filesystem does a
      special log flush.  gfs2_log_flush() does all the work for flushing out
      the and shutting down the incore log, and then it tries to grab the
      freeze glock in a shared state again.  Since the filesystem is stuck in
      gfs2_log_flush, no new transaction can start, and nothing can be written
      to disk. Unfreezing the filesytem simply involes dropping the freeze
      glock, allowing gfs2_log_flush() to grab and then release the shared
      lock, so it is cached for next time.
      
      However, in order for the unfreezing ioctl to occur, gfs2 needs to get a
      shared lock on the filesystem root directory inode to check permissions.
      If that glock has already been grabbed exclusively, fsfreeze will be
      unable to get the shared lock and unfreeze the filesystem.
      
      In order to allow the unfreeze, this patch makes gfs2 grab a shared lock
      on the filesystem root directory during the freeze, and hold it until it
      unfreezes the filesystem.  The functions which need to grab a shared
      lock in order to allow the unfreeze ioctl to be issued now use the lock
      grabbed by the freeze code instead.
      
      The freeze and unfreeze code take care to make sure that this shared
      lock will not be dropped while another process is using it.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      24972557
  16. 12 3月, 2014 1 次提交
  17. 25 2月, 2014 3 次提交
    • S
      GFS2: Remove extra "if" in gfs2_log_flush() · b1ab1e44
      Steven Whitehouse 提交于
      By reordering some of the assignments in gfs2_log_flush() it
      is possible to remove one of the "if" statements as it can be
      merged with one higher up the function.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b1ab1e44
    • S
      GFS2: Move log buffer accounting to transaction · 022ef4fe
      Steven Whitehouse 提交于
      Now we have a master transaction into which other transactions
      are merged, the accounting can be done using this master
      transaction. We no longer require the superblock fields which
      were being used for this function.
      
      In addition, this allows for a clean up in calc_reserved()
      making it rather easier understand. Also, by reducing the
      number of variables used to track the buffers being added
      and removed from the journal, a number of error checks are
      now no longer required.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      022ef4fe
    • S
      GFS2: Move log buffer lists into transaction · d69a3c65
      Steven Whitehouse 提交于
      Over time, we hope to be able to improve the concurrency available
      in the log code. This is one small step towards that, by moving
      the buffer lists from the super block, and into the transaction
      structure, so that each transaction builds its own buffer lists.
      
      At transaction commit time, the buffer lists are merged into
      the currently accumulating transaction. That transaction then
      is passed into the before and after commit functions at journal
      flush time. Thus there should be no change in overall behaviour
      yet.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d69a3c65
  18. 03 2月, 2014 1 次提交
    • S
      GFS2: Plug on AIL flush · 885bceca
      Steven Whitehouse 提交于
      When we do a flush of the AIL list, we are writing out what is
      likely to be a lot of small I/Os, which are possibly in an order
      which is not ideal performance-wise. Since this is done by calling
      filemap_fdatatwrite for each individual inode's address space there
      is no overall plugging going on.
      
      In addition to that, we do not always wait for AIL i/o when we flush
      it, so that it is possible for things to get left behind on the queue.
      By adding explicit plugging here, we reduce the chances of this
      being an issues. A quick test using the AIL flush tracepoint shows a
      small, but measurable improvement.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      885bceca
  19. 14 12月, 2013 1 次提交
    • B
      GFS2: Fix use-after-free race when calling gfs2_remove_from_ail · 9290a9a7
      Bob Peterson 提交于
      Function gfs2_remove_from_ail drops the reference on the bh via
      brelse. This patch fixes a race condition whereby bh is deferenced
      after the brelse when setting bd->bd_blkno = bh->b_blocknr;
      Under certain rare circumstances, bh might be gone or reused,
      and bd->bd_blkno is set to whatever that memory happens to be,
      which is often 0. Later, in gfs2_trans_add_unrevoke, that bd fails
      the test "bd->bd_blkno >= blkno" which causes it to never be freed.
      The end result is that the bd is never freed from the bufdata cache,
      which results in this error:
      slab error in kmem_cache_destroy(): cache `gfs2_bufdata': Can't free all objects
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9290a9a7
  20. 19 6月, 2013 1 次提交
    • B
      GFS2: aggressively issue revokes in gfs2_log_flush · 5d054964
      Benjamin Marzinski 提交于
      This patch looks at all the outstanding blocks in all the transactions
      on the log, and moves the completed ones to the ail2 list.  Then it
      issues revokes for these blocks.  This will hopefully speed things up
      in situations where there is a lot of contention for glocks, especially
      if they are acquired serially.
      
      revoke_lo_before_commit will issue at most one log block's full of these
      preemptive revokes. The amount of reserved log space that
      gfs2_log_reserve() ignores has been incremented to allow for this extra
      block.
      
      This patch also consolidates the common revoke instructions into one
      function, gfs2_add_revoke().
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5d054964
  21. 08 4月, 2013 1 次提交
    • B
      GFS2: replace gfs2_ail structure with gfs2_trans · 16ca9412
      Benjamin Marzinski 提交于
      In order to allow transactions and log flushes to happen at the same
      time, gfs2 needs to move the transaction accounting and active items
      list code into the gfs2_trans structure.  As a first step toward this,
      this patch removes the gfs2_ail structure, and handles the active items
      list in the gfs_trans structure.  This keeps gfs2 from allocating an ail
      structure on log flushes, and gives us a struture that can later be used
      to store the transaction accounting outside of the gfs2 superblock
      structure.
      
      With this patch, at the end of a transaction, gfs2 will add the
      gfs2_trans structure to the superblock if there is not one already.
      This structure now has the active items fields that were previously in
      gfs2_ail.  This is not necessary in the case where the transaction was
      simply used to add revokes, since these are never written outside of the
      journal, and thus, don't need an active items list.
      
      Also, in order to make sure that the transaction structure is not
      removed while it's still in use by gfs2_trans_end, unlocking the
      sd_log_flush_lock has to happen slightly later in ending the
      transaction.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      16ca9412
  22. 29 1月, 2013 1 次提交
    • S
      GFS2: Use ->writepages for ordered writes · 45138990
      Steven Whitehouse 提交于
      Instead of using a list of buffers to write ahead of the journal
      flush, this now uses a list of inodes and calls ->writepages
      via filemap_fdatawrite() in order to achieve the same thing. For
      most use cases this results in a shorter ordered write list,
      as well as much larger i/os being issued.
      
      The ordered write list is sorted by inode number before writing
      in order to retain the disk block ordering between inodes as
      per the previous code.
      
      The previous ordered write code used to conflict in its assumptions
      about how to write out the disk blocks with mpage_writepages()
      so that with this updated version we can also use mpage_writepages()
      for GFS2's ordered write, writepages implementation. So we will
      also send larger i/os from writeback too.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      45138990
  23. 02 5月, 2012 1 次提交
  24. 24 4月, 2012 4 次提交
    • S
      GFS2: Log code fixes · 144a4c2f
      Steven Whitehouse 提交于
      This patch removes a log lock from around atomic operation where
      it is not needed, removes an unused variable, and also changes
      a void pointer used incorrectly to a struct page pointer.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      144a4c2f
    • S
      GFS2: Remove bd_list_tr · c50b91c4
      Steven Whitehouse 提交于
      This is another clean up in the logging code. This per-transaction
      list was largely unused. Its main function was to ensure that the
      number of buffers in a transaction was correct, however that counter
      was only used to check the number of buffers in the bd_list_tr, plus
      an assert at the end of each transaction. With the assert now changed
      to use the calculated buffer counts, we can remove both bd_list_tr and
      its associated counter.
      
      This should make the code easier to understand as well as shrinking
      a couple of structures.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c50b91c4
    • S
      GFS2: Clean up log write code path · e8c92ed7
      Steven Whitehouse 提交于
      Prior to this patch, we have two ways of sending i/o to the log.
      One of those is used when we need to allocate both the data
      to be written itself and also a buffer head to submit it. This
      is done via sb_getblk and friends. This is used mostly for writing
      log headers.
      
      The other method is used when writing blocks which have some
      in-place counterpart. This is the case for all the metadata
      blocks which are journalled, and when journaled data is in use,
      for unescaped journalled data blocks.
      
      This patch replaces both of those two methods, and about half
      a dozen separate i/o submission points with a single i/o
      submission function. We also go direct to bio rather than
      using buffer heads, since this allows us to build i/o
      requests of the maximum size for the block device in
      question. It also reduces the memory required for flushing
      the log, which can be very useful in low memory situations.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e8c92ed7
    • S
      GFS2: Drop "pull" argument from log_write_header() · fdb76a42
      Steven Whitehouse 提交于
      The "pull" argument to log_write_header() is only used
      for debug purposes and it is not really needed any more. There
      are other tests for this particular problem, so I think we can
      dispose of it in order to simplify the code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fdb76a42
  25. 09 3月, 2012 1 次提交
    • S
      GFS2: Clean up log flush header writing · 34cc1781
      Steven Whitehouse 提交于
      We already send both a pre and post flush to the block device
      when writing a journal header. There is no need to wait for
      the previous I/O specifically when we do this, unless we've
      turned "barriers" off.
      
      As a side effect, this also cleans up the code path for flushing
      the journal and makes it more readable.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      34cc1781
  26. 29 2月, 2012 3 次提交
  27. 22 11月, 2011 1 次提交
    • T
      freezer: unexport refrigerator() and update try_to_freeze() slightly · a0acae0e
      Tejun Heo 提交于
      There is no reason to export two functions for entering the
      refrigerator.  Calling refrigerator() instead of try_to_freeze()
      doesn't save anything noticeable or removes any race condition.
      
      * Rename refrigerator() to __refrigerator() and make it return bool
        indicating whether it scheduled out for freezing.
      
      * Update try_to_freeze() to return bool and relay the return value of
        __refrigerator() if freezing().
      
      * Convert all refrigerator() users to try_to_freeze().
      
      * Update documentation accordingly.
      
      * While at it, add might_sleep() to try_to_freeze().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Christoph Hellwig <hch@infradead.org>
      a0acae0e
  28. 08 11月, 2011 1 次提交
    • S
      GFS2: Fix up REQ flags · 20ed0535
      Steven Whitehouse 提交于
      Christoph has split up REQ_PRIO from REQ_META. That means that
      we can drop REQ_PRIO from places where is it not needed. I'm
      not at all sure that the combination WRITE_FLUSH_FUA | REQ_PRIO
      makes any kind of sense, anyway.
      
      In addition, I've added REQ_META to one place in the code where
      it was missing. REQ_PRIO has been left for read/writes triggered
      by glock acquisition and writeback only. We can adjust it again
      if required, but these are the most important points from a
      performance perspective.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      20ed0535
  29. 23 8月, 2011 1 次提交
  30. 14 7月, 2011 1 次提交
    • S
      GFS2: Resolve inode eviction and ail list interaction bug · 380f7c65
      Steven Whitehouse 提交于
      This patch contains a few misc fixes which resolve a recently
      reported issue. This patch has been a real team effort and has
      received a lot of testing.
      
      The first issue is that the ail lock needs to be held over a few
      more operations. The lock thats added into gfs2_releasepage() may
      possibly be a candidate for replacing with RCU at some future
      point, but at this stage we've gone for the obvious fix.
      
      The second issue is that gfs2_write_inode() can end up calling
      a glock recursively when called from gfs2_evict_inode() via the
      syncing code, so it needs a guard added.
      
      The third issue is that we either need to not truncate the metadata
      pages of inodes which have zero link count, but which we cannot
      deallocate due to them still being in use by other nodes, or we need
      to ensure that those pages have all made it through the journal and
      ail lists first. This patch takes the former approach, but the
      latter has also been tested and there is nothing to choose between
      them performance-wise. So again, we could revise that decision
      in the future.
      
      Also, the inode eviction process is now better documented.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Tested-by: NBob Peterson <rpeterso@redhat.com>
      Tested-by: NAbhijith Das <adas@redhat.com>
      Reported-by: NBarry J. Marson <bmarson@redhat.com>
      Reported-by: NDavid Teigland <teigland@redhat.com>
      380f7c65
  31. 22 5月, 2011 1 次提交