1. 24 8月, 2010 1 次提交
    • D
      xfs: unlock items before allowing the CIL to commit · d17c701c
      Dave Chinner 提交于
      When we commit a transaction using delayed logging, we need to
      unlock the items in the transaciton before we unlock the CIL context
      and allow it to be checkpointed. If we unlock them after we release
      the CIl context lock, the CIL can checkpoint and complete before
      we free the log items. This breaks stale buffer item unlock and
      unpin processing as there is an implicit assumption that the unlock
      will occur before the unpin.
      
      Also, some log items need to store the LSN of the transaction commit
      in the item (inodes and EFIs) and so can race with other transaction
      completions if we don't prevent the CIL from checkpointing before
      the unlock occurs.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      d17c701c
  2. 27 7月, 2010 3 次提交
  3. 24 5月, 2010 3 次提交
    • D
      xfs: Ensure inode allocation buffers are fully replayed · ccf7c23f
      Dave Chinner 提交于
      With delayed logging, we can get inode allocation buffers in the
      same transaction inode unlink buffers. We don't currently mark inode
      allocation buffers in the log, so inode unlink buffers take
      precedence over allocation buffers.
      
      The result is that when they are combined into the same checkpoint,
      only the unlinked inode chain fields are replayed, resulting in
      uninitialised inode buffers being detected when the next inode
      modification is replayed.
      
      To fix this, we need to ensure that we do not set the inode buffer
      flag in the buffer log item format flags if the inode allocation has
      not already hit the log. To avoid requiring a change to log
      recovery, we really need to make this a modification that relies
      only on in-memory sate.
      
      We can do this by checking during buffer log formatting (while the
      CIL cannot be flushed) if we are still in the same sequence when we
      commit the unlink transaction as the inode allocation transaction.
      If we are, then we do not add the inode buffer flag to the buffer
      log format item flags. This means the entire buffer will be
      replayed, not just the unlinked fields. We do this while
      CIL flusheѕ are locked out to ensure that we don't race with the
      sequence numbers changing and hence fail to put the inode buffer
      flag in the buffer format flags when we really need to.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      ccf7c23f
    • D
      xfs: enable background pushing of the CIL · df806158
      Dave Chinner 提交于
      If we let the CIL grow without bound, it will grow large enough to violate
      recovery constraints (must be at least one complete transaction in the log at
      all times) or take forever to write out through the log buffers. Hence we need
      a check during asynchronous transactions as to whether the CIL needs to be
      pushed.
      
      We track the amount of log space the CIL consumes, so it is relatively simple
      to limit it on a pure size basis. Make the limit the minimum of just under half
      the log size (recovery constraint) or 8MB of log space (which is an awful lot
      of metadata).
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      df806158
    • D
      xfs: Introduce delayed logging core code · 71e330b5
      Dave Chinner 提交于
      The delayed logging code only changes in-memory structures and as
      such can be enabled and disabled with a mount option. Add the mount
      option and emit a warning that this is an experimental feature that
      should not be used in production yet.
      
      We also need infrastructure to track committed items that have not
      yet been written to the log. This is what the Committed Item List
      (CIL) is for.
      
      The log item also needs to be extended to track the current log
      vector, the associated memory buffer and it's location in the Commit
      Item List. Extend the log item and log vector structures to enable
      this tracking.
      
      To maintain the current log format for transactions with delayed
      logging, we need to introduce a checkpoint transaction and a context
      for tracking each checkpoint from initiation to transaction
      completion.  This includes adding a log ticket for tracking space
      log required/used by the context checkpoint.
      
      To track all the changes we need an io vector array per log item,
      rather than a single array for the entire transaction. Using the new
      log vector structure for this requires two passes - the first to
      allocate the log vector structures and chain them together, and the
      second to fill them out.  This log vector chain can then be passed
      to the CIL for formatting, pinning and insertion into the CIL.
      
      Formatting of the log vector chain is relatively simple - it's just
      a loop over the iovecs on each log vector, but it is made slightly
      more complex because we re-write the iovec after the copy to point
      back at the memory buffer we just copied into.
      
      This code also needs to pin log items. If the log item is not
      already tracked in this checkpoint context, then it needs to be
      pinned. Otherwise it is already pinned and we don't need to pin it
      again.
      
      The only other complexity is calculating the amount of new log space
      the formatting has consumed. This needs to be accounted to the
      transaction in progress, and the accounting is made more complex
      becase we need also to steal space from it for log metadata in the
      checkpoint transaction. Calculate all this at insert time and update
      all the tickets, counters, etc correctly.
      
      Once we've formatted all the log items in the transaction, attach
      the busy extents to the checkpoint context so the busy extents live
      until checkpoint completion and can be processed at that point in
      time. Transactions can then be freed at this point in time.
      
      Now we need to issue checkpoints - we are tracking the amount of log space
      used by the items in the CIL, so we can trigger background checkpoints when the
      space usage gets to a certain threshold. Otherwise, checkpoints need ot be
      triggered when a log synchronisation point is reached - a log force event.
      
      Because the log write code already handles chained log vectors, writing the
      transaction is trivial, too. Construct a transaction header, add it
      to the head of the chain and write it into the log, then issue a
      commit record write. Then we can release the checkpoint log ticket
      and attach the context to the log buffer so it can be called during
      Io completion to complete the checkpoint.
      
      We also need to allow for synchronising multiple in-flight
      checkpoints. This is needed for two things - the first is to ensure
      that checkpoint commit records appear in the log in the correct
      sequence order (so they are replayed in the correct order). The
      second is so that xfs_log_force_lsn() operates correctly and only
      flushes and/or waits for the specific sequence it was provided with.
      
      To do this we need a wait variable and a list tracking the
      checkpoint commits in progress. We can walk this list and wait for
      the checkpoints to change state or complete easily, an this provides
      the necessary synchronisation for correct operation in both cases.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      71e330b5