1. 09 1月, 2018 9 次提交
    • D
      xfs: refactor short form btree pointer verification · e1e55aaf
      Darrick J. Wong 提交于
      Now that we have xfs_verify_agbno, use it to verify short form btree
      pointers instead of open-coding them.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      e1e55aaf
    • D
      xfs: refactor long-format btree header verification routines · 8368a601
      Darrick J. Wong 提交于
      Create two helper functions to verify the headers of a long format
      btree block.  We'll use this later for the realtime rmapbt.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      8368a601
    • D
      xfs: remove XFS_FSB_SANITY_CHECK · 59f6fec3
      Darrick J. Wong 提交于
      We already have a function to verify fsb pointers, so get rid of the
      last users of the (less robust) macro.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      59f6fec3
    • B
      xfs: eliminate duplicate icreate tx reservation functions · c017cb5d
      Brian Foster 提交于
      The create transaction reservation calculation has two different
      branches of code depending on whether the filesystem is a v5 format
      fs or older. Each branch considers the max reservation between the
      allocation case (new chunk allocation + record insert) and the
      modify case (chunk exists, record modification) of inode allocation.
      
      The modify case is the same for both superblock versions with the
      exception of the finobt. The finobt helper checks the feature bit,
      however, and so the modify case already shares the same code.
      
      Now that inode chunk allocation has been refactored into a helper
      that checks the superblock version to calculate the appropriate
      reservation for the create transaction, the only remaining
      difference between the create and icreate branches is the call to
      the finobt helper. As noted above, the finobt helper is a no-op when
      the feature is not enabled. Therefore, these branches are
      effectively duplicate and can be condensed.
      
      Remove the xfs_calc_create_*() branch of functions and update the
      various callers to use the xfs_calc_icreate_*() variant. The latter
      creates the same reservation size for v4 create transactions as the
      removed branch. As such, this patch does not result in transaction
      reservation changes.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      c017cb5d
    • B
      xfs: refactor inode chunk alloc/free tx reservation · 57af33e4
      Brian Foster 提交于
      The reservation for the various forms of inode allocation is
      scattered across several different functions. This includes two
      variants of chunk allocation (v5 icreate transactions vs. older
      create transactions) and the inode free transaction.
      
      To clean up some of this code and clarify the purpose of specific
      allocfree reservations, continue the pattern of defining helper
      functions for smaller operational units of broader transactions.
      Refactor the reservation into an inode chunk alloc/free helper that
      considers the various conditions based on filesystem format.
      
      An inode chunk free involves an extent free and buffer
      invalidations. The latter requires reservation for log headers only.
      An inode chunk allocation modifies the free space btrees and logs
      the chunk on v4 supers. v5 supers initialize the inode chunk using
      ordered buffers and so do not log the chunk.
      
      As a side effect of this refactoring, add one more allocfree res to
      the ifree transaction. Technically this does not serve a specific
      purpose because inode chunks are freed via deferred operations and
      thus occur after a transaction roll. tr_ifree has a bit of a history
      of tx overruns caused by too many agfl fixups during sustained file
      deletion workloads, so add this extra reservation as a form of
      padding nonetheless.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      57af33e4
    • B
      xfs: include an allocfree res for inobt modifications · f03c78f3
      Brian Foster 提交于
      Analysis of recent reports of log reservation overruns and code
      inspection has uncovered that the reservations associated with inode
      operations may not cover the worst case scenarios. In particular,
      many cases only include one allocfree res. for a particular
      operation even though said operations may also entail AGFL fixups
      and inode btree block allocations in addition to the actual inode
      chunk allocation. This can easily turn into two or three block
      allocations (or frees) per operation.
      
      In theory, the only way to define the worst case reservation is to
      include an allocfree res for each individual allocation in a
      transaction. Since that is impractical (we can perform multiple agfl
      fixups per tx and not every allocation results in a full tree
      operation), we need to find a reasonable compromise that addresses
      the deficiency in practice without blowing out the size of the
      transactions.
      
      Since the inode btrees are not filled by the AGFL, record insertion
      and removal can directly result in block allocations and frees
      depending on the shape of the tree. These allocations and frees
      occur in the same transaction context as the inobt update itself,
      but are separate from the allocation/free that might be required for
      an inode chunk. Therefore, it makes sense to assume that an [f]inobt
      insert/remove can directly result in one or more block allocations
      on behalf of the tree.
      
      Refactor the inode transaction reservations to include one allocfree
      res. per inode btree modification to cover allocations required by
      the tree itself. This separates the reservation required to allocate
      the inode chunk from the reservation required for inobt record
      insertion/removal. Apply the same logic to the finobt. This results
      in killing off the finobt modify condition because we no longer
      assume that the broader transaction reservation will cover finobt
      block allocations and finobt shape changes can occur in either of
      the inobt allocation or modify situations.
      Suggested-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f03c78f3
    • B
      xfs: truncate transaction does not modify the inobt · a606ebdb
      Brian Foster 提交于
      The truncate transaction does not ever modify the inode btree, but
      includes an associated log reservation. Update
      xfs_calc_itruncate_reservation() to remove the reservation
      associated with inobt updates.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      a606ebdb
    • B
      xfs: fix up agi unlinked list reservations · e8341d9f
      Brian Foster 提交于
      The current AGI unlinked list addition and removal reservations do
      not reflect the worst case log usage. An unlinked list removal can
      log up to two on-disk inode clusters but only includes reservation
      for one. An unlinked list addition logs the on-disk cluster but
      includes reservation for an in-core inode.
      
      Update the AGI unlinked list reservation helpers to calculate the
      correct worst case reservation for the associated operations.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e8341d9f
    • B
      xfs: include inobt buffers in ifree tx log reservation · a6f48590
      Brian Foster 提交于
      The tr_ifree transaction handles inode unlinks and inode chunk
      frees. The current transaction calculation does not accurately
      reflect worst case changes to the inode btree, however. The inobt
      portion of the current transaction reservation only covers
      modification of a single inobt buffer (for the particular inode
      record). This is a historical artifact from the days before XFS
      supported full inode chunk removal.
      
      When support for inode chunk removal was added in commit
      254f6311ed1b ("Implement deletion of inode clusters in XFS."), the
      additional log reservation required for chunk removal was not added
      correctly. The new reservation only considered the header overhead
      of associated buffers rather than the full contents of the btrees
      and AGF and AGFL buffers affected by the transaction. The
      reservation for the free space btrees was subsequently fixed up in
      commit 5fe6abb82f76 ("Add space for inode and allocation btrees to
      ITRUNCATE log reservation"), but the res. for full inobt joins has
      never been added.
      
      Further review of the ifree reservation uncovered a couple more
      problems:
      
      - The undocumented +2 blocks are intended for the AGF and AGFL, but
        are also not sized correctly and should be logged as full sectors
        (not FSBs).
      - The additional single block header is undocumented and serves no
        apparent purpose.
      
      Update xfs_calc_ifree_reservation() to include a full inobt join in
      the reservation calculation. Refactor the undocumented blocks
      appropriately and fix up the comments to reflect the current
      calculation.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      a6f48590
  2. 22 12月, 2017 3 次提交
    • D
      xfs: only skip rmap owner checks for unknown-owner rmap removal · 68c58e9b
      Darrick J. Wong 提交于
      For rmap removal, refactor the rmap owner checks into a separate
      function, then skip the checks if we are performing an unknown-owner
      removal.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      68c58e9b
    • D
      xfs: always honor OWN_UNKNOWN rmap removal requests · 33df3a9c
      Darrick J. Wong 提交于
      Calling xfs_rmap_free with an unknown owner is supposed to remove any
      rmaps covering that range regardless of owner.  This is used by the EFI
      recovery code to say "we're freeing this, it mustn't be owned by
      anything anymore", but for whatever reason xfs_free_ag_extent filters
      them out.
      
      Therefore, remove the filter and make xfs_rmap_unmap actually treat it
      as a wildcard owner -- free anything that's already there, and if
      there's no owner at all then that's fine too.
      
      There are two existing callers of bmap_add_free that take care the rmap
      deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap
      cleanup; convert these to use OWN_NULL (via helpers), and now we really
      require that an RUI (if any) gets added to the defer ops before any EFI.
      
      Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests,
      growfs will have to consult directly with the rmap to ensure that there
      aren't any rmaps in the grown region.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      33df3a9c
    • D
      xfs: queue deferred rmap ops for cow staging extent alloc/free in the right order · 0525e952
      Darrick J. Wong 提交于
      Under the deferred rmap operation scheme, there's a certain order in
      which the rmap deferred ops have to be queued to maintain integrity
      during log replay.  For alloc/map operations that order is cui -> rui;
      for free/unmap operations that order is cui -> rui -> efi.  However, the
      initial refcount code got the ordering wrong in the free side of things
      because it queued refcount free op and an EFI and the refcount free op
      queued a rmap free op, resulting in the order cui -> efi -> rui.
      
      If we fail before the efd finishes, the efi recovery will try to do a
      wildcard rmap removal and the subsequent rui will fail to find the rmap
      and blow up.  This didn't ever happen due to other screws up in handling
      unknown owner rmap removals, but those other screw ups broke recovery in
      other ways, so fix the ordering to follow the intended rules.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      0525e952
  3. 15 12月, 2017 4 次提交
  4. 09 12月, 2017 1 次提交
    • C
      xfs: remove "no-allocation" reservations for file creations · f59cf5c2
      Christoph Hellwig 提交于
      If we create a new file we will need an inode, and usually some metadata
      in the parent direction.  Aiming for everything to go well despite the
      lack of a reservation leads to dirty transactions cancelled under a heavy
      create/delete load.  This patch removes those nospace transactions, which
      will lead to slightly earlier ENOSPC on some workloads, but instead
      prevent file system shutdowns due to cancelling dirty transactions for
      others.
      
      A customer could observe assertations failures and shutdowns due to
      cancelation of dirty transactions during heavy NFS workloads as shown
      below:
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace:
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246]  [<ffffffff81795daf>] dump_stack+0x63/0x81
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262]  [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264]  [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285]  [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308]  [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329]  [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348]  [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366]  [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380]  [<ffffffff81231de5>] vfs_create+0xd5/0x140
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390]  [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396]  [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401]  [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416]  [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423]  [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427]  [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432]  [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438]  [<ffffffff810c0d58>] kthread+0xd8/0xf0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451]  [<ffffffff8179d962>] ret_from_fork+0x42/0x70
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]---
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x4ee/0x7d0 [xfs]
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f59cf5c2
  5. 29 11月, 2017 1 次提交
  6. 21 11月, 2017 2 次提交
  7. 17 11月, 2017 1 次提交
  8. 10 11月, 2017 7 次提交
  9. 07 11月, 2017 12 次提交