1. 12 5月, 2022 6 次提交
    • D
      xfs: clean up final attr removal in xfs_attr_set_iter · b11fa61b
      Dave Chinner 提交于
      Clean up the final leaf/node states in xfs_attr_set_iter() to
      further simplify the high level state machine and to set the
      completion state correctly. As we are adding a separate state
      for node format removal, we need to ensure that node formats
      are collapsed back to shortform or empty correctly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b11fa61b
    • D
      xfs: remote xattr removal in xfs_attr_set_iter() is conditional · 2e7ef218
      Dave Chinner 提交于
      We may not have a remote value for the old xattr we have to remove,
      so skip over the remote value removal states and go straight to
      the xattr name removal in the leaf/node block.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2e7ef218
    • D
      xfs: XFS_DAS_LEAF_REPLACE state only needed if !LARP · 411b434a
      Dave Chinner 提交于
      We can skip the REPLACE state when LARP is enabled, but that means
      the XFS_DAS_FLIP_LFLAG state is now poorly named - it indicates
      something that has been done rather than what the state is going to
      do. Rename it to "REMOVE_OLD" to indicate that we are now going to
      perform removal of the old attr.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      411b434a
    • D
      xfs: split remote attr setting out from replace path · 7d035336
      Dave Chinner 提交于
      When we set a new xattr, we have three exit paths:
      
      	1. nothing else to do
      	2. allocate and set the remote xattr value
      	3. perform the rest of a replace operation
      
      Currently we push both 2 and 3 into the same state, regardless of
      whether we just set a remote attribute or not. Once we've set the
      remote xattr, we have two exit states:
      
      	1. nothing else to do
      	2. perform the rest of a replace operation
      
      Hence we can split the remote xattr allocation and setting into
      their own states and factor it out of xfs_attr_set_iter() to further
      clean up the state machine and the implementation of the state
      machine.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      7d035336
    • D
      xfs: kill XFS_DAC_LEAF_ADDNAME_INIT · 2157d169
      Dave Chinner 提交于
      We re-enter the XFS_DAS_FOUND_LBLK state when we have to allocate
      multiple extents for a remote xattr. We currently have a flag
      called XFS_DAC_LEAF_ADDNAME_INIT to avoid running the remote attr
      hole finding code more than once.
      
      However, for the node format tree, we have a separate state for this
      so we never reenter the state machine at XFS_DAS_FOUND_NBLK and so
      it does not need a special flag to skip over the remote attr hold
      finding code.
      
      Convert the leaf block code to use the same state machine as the
      node blocks and kill the  XFS_DAC_LEAF_ADDNAME_INIT flag.
      
      This further points out that this "ALLOC" state is only traversed
      if we have remote xattrs or we are doing a rename operation. Rename
      both the leaf and node alloc states to _ALLOC_RMT to indicate they
      are iterating to do allocation of remote xattr blocks.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2157d169
    • D
      xfs: separate out initial attr_set states · e0c41089
      Dave Chinner 提交于
      We current use XFS_DAS_UNINIT for several steps in the attr_set
      state machine. We use it for setting shortform xattrs, converting
      from shortform to leaf, leaf add, leaf-to-node and leaf add. All of
      these things are essentially known before we start the state machine
      iterating, so we really should separate them out:
      
      XFS_DAS_SF_ADD:
      	- tries to do a shortform add
      	- on success -> done
      	- on ENOSPC converts to leaf, -> XFS_DAS_LEAF_ADD
      	- on error, dies.
      
      XFS_DAS_LEAF_ADD:
      	- tries to do leaf add
      	- on success:
      		- inline attr -> done
      		- remote xattr || REPLACE -> XFS_DAS_FOUND_LBLK
      	- on ENOSPC converts to node, -> XFS_DAS_NODE_ADD
      	- on error, dies
      
      XFS_DAS_NODE_ADD:
      	- tries to do node add
      	- on success:
      		- inline attr -> done
      		- remote xattr || REPLACE -> XFS_DAS_FOUND_NBLK
      	- on error, dies
      
      This makes it easier to understand how the state machine starts
      up and sets us up on the path to further state machine
      simplifications.
      
      This also converts the DAS state tracepoints to use strings rather
      than numbers, as converting between enums and numbers requires
      manual counting rather than just reading the name.
      
      This also introduces a XFS_DAS_DONE state so that we can trace
      successful operation completions easily.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e0c41089
  2. 11 5月, 2022 2 次提交
  3. 04 5月, 2022 1 次提交
    • D
      xfs: intent item whiteouts · 0d227466
      Dave Chinner 提交于
      When we log modifications based on intents, we add both intent
      and intent done items to the modification being made. These get
      written to the log to ensure that the operation is re-run if the
      intent done is not found in the log.
      
      However, for operations that complete wholly within a single
      checkpoint, the change in the checkpoint is atomic and will never
      need replay. In this case, we don't need to actually write the
      intent and intent done items to the journal because log recovery
      will never need to manually restart this modification.
      
      Log recovery currently handles intent/intent done matching by
      inserting the intent into the AIL, then removing it when a matching
      intent done item is found. Hence for all the intent-based operations
      that complete within a checkpoint, we spend all that time parsing
      the intent/intent done items just to cancel them and do nothing with
      them.
      
      Hence it follows that the only time we actually need intents in the
      log is when the modification crosses checkpoint boundaries in the
      log and so may only be partially complete in the journal. Hence if
      we commit and intent done item to the CIL and the intent item is in
      the same checkpoint, we don't actually have to write them to the
      journal because log recovery will always cancel the intents.
      
      We've never really worried about the overhead of logging intents
      unnecessarily like this because the intents we log are generally
      very much smaller than the change being made. e.g. freeing an extent
      involves modifying at lease two freespace btree blocks and the AGF,
      so the EFI/EFD overhead is only a small increase in space and
      processing time compared to the overall cost of freeing an extent.
      
      However, delayed attributes change this cost equation dramatically,
      especially for inline attributes. In the case of adding an inline
      attribute, we only log the inode core and attribute fork at present.
      With delayed attributes, we now log the attr intent which includes
      the name and value, the inode core adn attr fork, and finally the
      attr intent done item. We increase the number of items we log from 1
      to 3, and the number of log vectors (regions) goes up from 3 to 7.
      Hence we tripple the number of objects that the CIL has to process,
      and more than double the number of log vectors that need to be
      written to the journal.
      
      At scale, this means delayed attributes cause a non-pipelined CIL to
      become CPU bound processing all the extra items, resulting in a > 40%
      performance degradation on 16-way file+xattr create worklaods.
      Pipelining the CIL (as per 5.15) reduces the performance degradation
      to 20%, but now the limitation is the rate at which the log items
      can be written to the iclogs and iclogs be dispatched for IO and
      completed.
      
      Even log IO completion is slowed down by these intents, because it
      now has to process 3x the number of items in the checkpoint.
      Processing completed intents is especially inefficient here, because
      we first insert the intent into the AIL, then remove it from the AIL
      when the intent done is processed. IOWs, we are also doing expensive
      operations in log IO completion we could completely avoid if we
      didn't log completed intent/intent done pairs.
      
      Enter log item whiteouts.
      
      When an intent done is committed, we can check to see if the
      associated intent is in the same checkpoint as we are currently
      committing the intent done to. If so, we can mark the intent log
      item with a whiteout and immediately free the intent done item
      rather than committing it to the CIL. We can basically skip the
      entire formatting and CIL insertion steps for the intent done item.
      
      However, we cannot remove the intent item from the CIL at this point
      because the unlocked per-cpu CIL item lists do not permit removal
      without holding the CIL context lock exclusively. Transaction commit
      only holds the context lock shared, hence the best we can do is mark
      the intent item with a whiteout so that the CIL push can release it
      rather than writing it to the log.
      
      This means we never write the intent to the log if the intent done
      has also been committed to the same checkpoint, but we'll always
      write the intent if the intent done has not been committed or has
      been committed to a different checkpoint. This will result in
      correct log recovery behaviour in all cases, without the overhead of
      logging unnecessary intents.
      
      This intent whiteout concept is generic - we can apply it to all
      intent/intent done pairs that have a direct 1:1 relationship. The
      way deferred ops iterate and relog intents mean that all intents
      currently have a 1:1 relationship with their done intent, and hence
      we can apply this cancellation to all existing intent/intent done
      implementations.
      
      For delayed attributes with a 16-way 64kB xattr create workload,
      whiteouts reduce the amount of journalled metadata from ~2.5GB/s
      down to ~600MB/s and improve the creation rate from 9000/s to
      14000/s.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0d227466
  4. 29 4月, 2022 3 次提交
  5. 28 4月, 2022 1 次提交
  6. 21 4月, 2022 2 次提交
  7. 11 4月, 2022 2 次提交
  8. 20 3月, 2022 1 次提交
  9. 15 3月, 2022 1 次提交
  10. 20 10月, 2021 1 次提交
  11. 20 8月, 2021 16 次提交
  12. 19 8月, 2021 2 次提交
  13. 17 8月, 2021 1 次提交
  14. 10 8月, 2021 1 次提交