1. 19 8月, 2015 1 次提交
    • B
      xfs: don't leave EFIs on AIL on mount failure · f0b2efad
      Brian Foster 提交于
      Log recovery occurs in two phases at mount time. In the first phase,
      EFIs and EFDs are processed and potentially cancelled out. EFIs without
      EFD objects are inserted into the AIL for processing and recovery in the
      second phase. xfs_mountfs() runs various other operations between the
      phases and is thus subject to failure. If failure occurs after the first
      phase but before the second, pending EFIs sit on the AIL, pin it and
      cause the mount to hang.
      
      Update the mount sequence to ensure that pending EFIs are cancelled in
      the event of failure. Add a recovery cancellation mechanism to iterate
      the AIL and cancel all EFI items when requested. Plumb cancellation
      support through the log mount finish helper and update xfs_mountfs() to
      invoke cancellation in the event of failure after recovery has started.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f0b2efad
  2. 04 6月, 2015 2 次提交
  3. 20 5月, 2014 1 次提交
    • D
      xfs: log vector rounding leaks log space · 110dc24a
      Dave Chinner 提交于
      The addition of direct formatting of log items into the CIL
      linear buffer added alignment restrictions that the start of each
      vector needed to be 64 bit aligned. Hence padding was added in
      xlog_finish_iovec() to round up the vector length to ensure the next
      vector started with the correct alignment.
      
      This adds a small number of bytes to the size of
      the linear buffer that is otherwise unused. The issue is that we
      then use the linear buffer size to determine the log space used by
      the log item, and this includes the unused space. Hence when we
      account for space used by the log item, it's more than is actually
      written into the iclogs, and hence we slowly leak this space.
      
      This results on log hangs when reserving space, with threads getting
      stuck with these stack traces:
      
      Call Trace:
      [<ffffffff81d15989>] schedule+0x29/0x70
      [<ffffffff8150d3a2>] xlog_grant_head_wait+0xa2/0x1a0
      [<ffffffff8150d55d>] xlog_grant_head_check+0xbd/0x140
      [<ffffffff8150ee33>] xfs_log_reserve+0x103/0x220
      [<ffffffff814b7f05>] xfs_trans_reserve+0x2f5/0x310
      .....
      
      The 4 bytes is significant. Brain Foster did all the hard work in
      tracking down a reproducable leak to inode chunk allocation (it went
      away with the ikeep mount option). His rough numbers were that
      creating 50,000 inodes leaked 11 log blocks. This turns out to be
      roughly 800 inode chunks or 1600 inode cluster buffers. That
      works out at roughly 4 bytes per cluster buffer logged, and at that
      I started looking for a 4 byte leak in the buffer logging code.
      
      What I found was that a struct xfs_buf_log_format structure for an
      inode cluster buffer is 28 bytes in length. This gets rounded up to
      32 bytes, but the vector length remains 28 bytes. Hence the CIL
      ticket reservation is decremented by 32 bytes (via lv->lv_buf_len)
      for that vector rather than 28 bytes which are written into the log.
      
      The fix for this problem is to separately track the bytes used by
      the log vectors in the item and use that instead of the buffer
      length when accounting for the log space that will be used by the
      formatted log item.
      
      Again, thanks to Brian Foster for doing all the hard work and long
      hours to isolate this leak and make finding the bug relatively
      simple.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      110dc24a
  4. 07 2月, 2014 1 次提交
  5. 13 12月, 2013 2 次提交
  6. 24 10月, 2013 1 次提交
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
  7. 14 8月, 2013 1 次提交
  8. 13 8月, 2013 1 次提交
  9. 28 6月, 2013 2 次提交
    • D
      xfs: Inode create log items · 3ebe7d2d
      Dave Chinner 提交于
      Introduce the inode create log item type for logical inode create logging.
      Instead of logging the changes in buffers, pass the range to be
      initialised through the log by a new transaction type.  This reduces
      the amount of log space required to record initialisation during
      allocation from about 128 bytes per inode to a small fixed amount
      per inode extent to be initialised.
      
      This requires a new log item type to track it through the log
      and the AIL. This is a relatively simple item - most callbacks are
      noops as this item has the same life cycle as the transaction.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3ebe7d2d
    • D
      xfs: Introduce ordered log vector support · fd63875c
      Dave Chinner 提交于
      And "ordered log vector" is a log vector that is used for
      tracking a log item through the CIL and into the AIL as part of the
      log checkpointing. These ordered log vectors are special in that
      they are not written to to journal in any way, and are not accounted
      to the checkpoint being written.
      
      The reason for this behaviour is to allow operations to attach items
      to transactions and have them follow the normal transactional
      lifecycle without actually having to write them to the journal. This
      allows logging of items that track high level logical changes and
      writing them to the log, while the physical items being modified
      pass through into the AIL and pin the tail of the log (and therefore
      the logical item in the log) until all the modified items are
      physically written to disk.
      
      IOWs, it allows us to write metadata without physically logging
      every individual change but still maintain the full transactional
      integrity guarantees we currently have w.r.t. crash recovery.
      
      This change modifies some of the CIL item insertion loops, as
      ordered log vectors introduce some new constraints as they don't
      track any data. One advantage of this change is that it combines
      two log vector chain walks into a single pass, so there is less
      overhead in the transaction commit pass as well. It also kills some
      unused code in the log vector walk loop when committing the CIL.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      fd63875c
  10. 18 10月, 2012 2 次提交
  11. 15 5月, 2012 1 次提交
  12. 23 2月, 2012 3 次提交
  13. 09 12月, 2011 2 次提交
  14. 09 11月, 2011 1 次提交
  15. 29 4月, 2011 1 次提交
    • C
      xfs: exact busy extent tracking · 97d3ac75
      Christoph Hellwig 提交于
      Update the extent tree in case we have to reuse a busy extent, so that it
      always is kept uptodate.  This is done by replacing the busy list searches
      with a new xfs_alloc_busy_reuse helper, which updates the busy extent tree
      in case of a reuse.  This allows us to allow reusing metadata extents
      unconditionally, and thus avoid log forces especially for allocation btree
      blocks.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      97d3ac75
  16. 28 1月, 2011 1 次提交
  17. 27 7月, 2010 3 次提交
  18. 24 5月, 2010 3 次提交
    • D
      xfs: Ensure inode allocation buffers are fully replayed · ccf7c23f
      Dave Chinner 提交于
      With delayed logging, we can get inode allocation buffers in the
      same transaction inode unlink buffers. We don't currently mark inode
      allocation buffers in the log, so inode unlink buffers take
      precedence over allocation buffers.
      
      The result is that when they are combined into the same checkpoint,
      only the unlinked inode chain fields are replayed, resulting in
      uninitialised inode buffers being detected when the next inode
      modification is replayed.
      
      To fix this, we need to ensure that we do not set the inode buffer
      flag in the buffer log item format flags if the inode allocation has
      not already hit the log. To avoid requiring a change to log
      recovery, we really need to make this a modification that relies
      only on in-memory sate.
      
      We can do this by checking during buffer log formatting (while the
      CIL cannot be flushed) if we are still in the same sequence when we
      commit the unlink transaction as the inode allocation transaction.
      If we are, then we do not add the inode buffer flag to the buffer
      log format item flags. This means the entire buffer will be
      replayed, not just the unlinked fields. We do this while
      CIL flusheѕ are locked out to ensure that we don't race with the
      sequence numbers changing and hence fail to put the inode buffer
      flag in the buffer format flags when we really need to.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      ccf7c23f
    • D
      xfs: Introduce delayed logging core code · 71e330b5
      Dave Chinner 提交于
      The delayed logging code only changes in-memory structures and as
      such can be enabled and disabled with a mount option. Add the mount
      option and emit a warning that this is an experimental feature that
      should not be used in production yet.
      
      We also need infrastructure to track committed items that have not
      yet been written to the log. This is what the Committed Item List
      (CIL) is for.
      
      The log item also needs to be extended to track the current log
      vector, the associated memory buffer and it's location in the Commit
      Item List. Extend the log item and log vector structures to enable
      this tracking.
      
      To maintain the current log format for transactions with delayed
      logging, we need to introduce a checkpoint transaction and a context
      for tracking each checkpoint from initiation to transaction
      completion.  This includes adding a log ticket for tracking space
      log required/used by the context checkpoint.
      
      To track all the changes we need an io vector array per log item,
      rather than a single array for the entire transaction. Using the new
      log vector structure for this requires two passes - the first to
      allocate the log vector structures and chain them together, and the
      second to fill them out.  This log vector chain can then be passed
      to the CIL for formatting, pinning and insertion into the CIL.
      
      Formatting of the log vector chain is relatively simple - it's just
      a loop over the iovecs on each log vector, but it is made slightly
      more complex because we re-write the iovec after the copy to point
      back at the memory buffer we just copied into.
      
      This code also needs to pin log items. If the log item is not
      already tracked in this checkpoint context, then it needs to be
      pinned. Otherwise it is already pinned and we don't need to pin it
      again.
      
      The only other complexity is calculating the amount of new log space
      the formatting has consumed. This needs to be accounted to the
      transaction in progress, and the accounting is made more complex
      becase we need also to steal space from it for log metadata in the
      checkpoint transaction. Calculate all this at insert time and update
      all the tickets, counters, etc correctly.
      
      Once we've formatted all the log items in the transaction, attach
      the busy extents to the checkpoint context so the busy extents live
      until checkpoint completion and can be processed at that point in
      time. Transactions can then be freed at this point in time.
      
      Now we need to issue checkpoints - we are tracking the amount of log space
      used by the items in the CIL, so we can trigger background checkpoints when the
      space usage gets to a certain threshold. Otherwise, checkpoints need ot be
      triggered when a log synchronisation point is reached - a log force event.
      
      Because the log write code already handles chained log vectors, writing the
      transaction is trivial, too. Construct a transaction header, add it
      to the head of the chain and write it into the log, then issue a
      commit record write. Then we can release the checkpoint log ticket
      and attach the context to the log buffer so it can be called during
      Io completion to complete the checkpoint.
      
      We also need to allow for synchronising multiple in-flight
      checkpoints. This is needed for two things - the first is to ensure
      that checkpoint commit records appear in the log in the correct
      sequence order (so they are replayed in the correct order). The
      second is so that xfs_log_force_lsn() operates correctly and only
      flushes and/or waits for the specific sequence it was provided with.
      
      To do this we need a wait variable and a list tracking the
      checkpoint commits in progress. We can walk this list and wait for
      the checkpoints to change state or complete easily, an this provides
      the necessary synchronisation for correct operation in both cases.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      71e330b5
    • D
      xfs: make the log ticket ID available outside the log infrastructure · 955833cf
      Dave Chinner 提交于
      The ticket ID is needed to uniquely identify transactions when doing busy
      extent matching. Delayed logging changes the lifecycle of busy extents with
      respect to the transaction structure lifecycle. Hence we can no longer use
      the transaction structure as a means of determining the owner of the busy
      extent as it may be freed and reused while the busy extent is still active.
      
      This commit provides the infrastructure to access the xlog_tid_t held in the
      ticket from a transaction handle. This avoids the need for callers to peek
      into the transaction and log structures to find this out.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      955833cf
  19. 19 5月, 2010 2 次提交
  20. 02 3月, 2010 1 次提交
  21. 22 1月, 2010 2 次提交
  22. 16 3月, 2009 1 次提交
  23. 17 11月, 2008 1 次提交
    • D
      [XFS] Fix double free of log tickets · cc09c0dc
      Dave Chinner 提交于
      When an I/O error occurs during an intermediate commit on a rolling
      transaction, xfs_trans_commit() will free the transaction structure
      and the related ticket. However, the duplicate transaction that
      gets used as the transaction continues still contains a pointer
      to the ticket. Hence when the duplicate transaction is cancelled
      and freed, we free the ticket a second time.
      
      Add reference counting to the ticket so that we hold an extra
      reference to the ticket over the transaction commit. We drop the
      extra reference once we have checked that the transaction commit
      did not return an error, thus avoiding a double free on commit
      error.
      
      Credit to Nick Piggin for tripping over the problem.
      
      SGI-PV: 989741
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      cc09c0dc
  24. 13 8月, 2008 1 次提交
  25. 18 4月, 2008 1 次提交
    • D
      [XFS] Sanitise xfs_log_force error checking. · b911ca04
      David Chinner 提交于
      xfs_log_force() is declared to return an error, but we almost never check
      it. We don't need to check it in most cases; if there's a log I/O error
      then we'll be shutting down the filesystem anyway and that means we'll
      catch the error somewhere else.
      
      However, on certain calls we should be returning an error - sync
      transactions, fsync, sync writes, etc. so this isn't a pure black and
      white distinction. Hence make xfs_log_force() a void function that issues
      a warning to the syslog on error, and call _xfs_log_force() in all the
      places where we actually care about the error status returned.
      
      SGI-PV: 980084
      SGI-Modid: xfs-linux-melb:xfs-kern:30832a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NNiv Sardi <xaiki@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      b911ca04
  26. 07 2月, 2008 1 次提交
  27. 28 9月, 2006 1 次提交