1. 14 3月, 2020 1 次提交
  2. 12 3月, 2020 1 次提交
  3. 03 7月, 2019 1 次提交
  4. 29 6月, 2019 2 次提交
  5. 12 6月, 2019 1 次提交
  6. 21 5月, 2019 1 次提交
    • D
      xfs: don't reserve per-AG space for an internal log · 5cd213b0
      Darrick J. Wong 提交于
      It turns out that the log can consume nearly all the space in an AG, and
      when this happens this it's possible that there will be less free space
      in the AG than the reservation would try to hide.  On a debug kernel
      this can trigger an ASSERT in xfs/250:
      
      XFS: Assertion failed: xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved + xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved <= pag->pagf_freeblks + pag->pagf_flcount, file: fs/xfs/libxfs/xfs_ag_resv.c, line: 319
      
      The log is permanently allocated, so we know we're never going to have
      to expand the btrees to hold any records associated with the log space.
      We therefore can treat the space as if it doesn't exist.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      5cd213b0
  7. 15 2月, 2019 1 次提交
  8. 12 2月, 2019 2 次提交
    • B
      xfs: distinguish between inobt and finobt magic values · 8473fee3
      Brian Foster 提交于
      The inode btree verifier code is shared between the inode btree and
      free inode btree because the underlying metadata formats are
      essentially equivalent. A side effect of this is that the verifier
      cannot determine whether a particular btree block should have an
      inobt or finobt magic value.
      
      This logic allows an unfortunate xfs_repair bug to escape detection
      where certain level > 0 nodes of the finobt are stamped with inobt
      magic by xfs_repair finobt reconstruction. This is fortunately not a
      severe problem since the inode btree magic values do not contribute
      to any changes in kernel behavior, but we do need a means to detect
      and prevent this problem in the future.
      
      Add a field to xfs_buf_ops to store the v4 and v5 superblock magic
      values expected by a particular verifier. Add a helper to check an
      on-disk magic value against the value expected by the verifier. Call
      the helper from the shared [f]inobt verifier code for magic value
      verification. This ensures that the inode btree blocks each have the
      appropriate magic value based on specific tree type and superblock
      version.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      8473fee3
    • B
      xfs: create a separate finobt verifier · 01e68f40
      Brian Foster 提交于
      The inobt verifier is reused for the inobt and finobt, which
      prevents the ability to distinguish between magic values on a
      per-tree basis. Create a separate finobt structure in preparation
      for changes to enforce the appropriate magic value for the
      associated tree. This patch has no functional change.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      01e68f40
  9. 13 12月, 2018 1 次提交
  10. 21 11月, 2018 1 次提交
    • D
      xfs: finobt AG reserves don't consider last AG can be a runt · c0876897
      Dave Chinner 提交于
      The last AG may be very small comapred to all other AGs, and hence
      AG reservations based on the superblock AG size may actually consume
      more space than the AG actually has. This results on assert failures
      like:
      
      XFS: Assertion failed: xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved + xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved <= pag->pagf_freeblks + pag->pagf_flcount, file: fs/xfs/libxfs/xfs_ag_resv.c, line: 319
      [   48.932891]  xfs_ag_resv_init+0x1bd/0x1d0
      [   48.933853]  xfs_fs_reserve_ag_blocks+0x37/0xb0
      [   48.934939]  xfs_mountfs+0x5b3/0x920
      [   48.935804]  xfs_fs_fill_super+0x462/0x640
      [   48.936784]  ? xfs_test_remount_options+0x60/0x60
      [   48.937908]  mount_bdev+0x178/0x1b0
      [   48.938751]  mount_fs+0x36/0x170
      [   48.939533]  vfs_kern_mount.part.43+0x54/0x130
      [   48.940596]  do_mount+0x20e/0xcb0
      [   48.941396]  ? memdup_user+0x3e/0x70
      [   48.942249]  ksys_mount+0xba/0xd0
      [   48.943046]  __x64_sys_mount+0x21/0x30
      [   48.943953]  do_syscall_64+0x54/0x170
      [   48.944835]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Hence we need to ensure the finobt per-ag space reservations take
      into account the size of the last AG rather than treat it like all
      the other full size AGs.
      
      Note that both refcountbt and rmapbt already take the size of the AG
      into account via reading the AGF length directly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      c0876897
  11. 30 7月, 2018 1 次提交
  12. 24 7月, 2018 1 次提交
  13. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  14. 30 5月, 2018 1 次提交
  15. 16 5月, 2018 1 次提交
  16. 10 4月, 2018 1 次提交
  17. 12 3月, 2018 1 次提交
  18. 13 1月, 2018 1 次提交
    • B
      xfs: account finobt blocks properly in perag reservation · ad90bb58
      Brian Foster 提交于
      XFS started using the perag metadata reservation pool for free inode
      btree blocks in commit 76d771b4 ("xfs: use per-AG reservations
      for the finobt"). To handle backwards compatibility, finobt blocks
      are accounted against the pool so long as the full reservation is
      available at mount time. Otherwise the ->m_inotbt_nores flag is set
      and the filesystem falls back to the traditional per-transaction
      finobt reservation.
      
      This commit has two problems:
      
      - finobt blocks are always accounted against the metadata
        reservation on allocation, regardless of ->m_inotbt_nores state
      - finobt blocks are never returned to the reservation pool on free
      
      The first problem affects reflink+finobt filesystems where the full
      finobt reservation is not available at mount time. finobt blocks are
      essentially stolen from the reflink reservation, putting refcountbt
      management at risk of allocation failure. The second problem is an
      unconditional leak of metadata reservation whenever finobt is
      enabled.
      
      Update the finobt block allocation callouts to consider
      ->m_inotbt_nores and account blocks appropriately. Blocks should be
      consistently accounted against the metadata pool when
      ->m_inotbt_nores is false and otherwise tagged as RESV_NONE.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      ad90bb58
  19. 09 1月, 2018 4 次提交
  20. 20 6月, 2017 3 次提交
  21. 25 1月, 2017 1 次提交
    • C
      xfs: use per-AG reservations for the finobt · 76d771b4
      Christoph Hellwig 提交于
      Currently we try to rely on the global reserved block pool for block
      allocations for the free inode btree, but I have customer reports
      (fairly complex workload, need to find an easier reproducer) where that
      is not enough as the AG where we free an inode that requires a new
      finobt block is entirely full.  This causes us to cancel a dirty
      transaction and thus a file system shutdown.
      
      I think the right way to guard against this is to treat the finot the same
      way as the refcount btree and have a per-AG reservations for the possible
      worst case size of it, and the patch below implements that.
      
      Note that this could increase mount times with large finobt trees.  In
      an ideal world we would have added a field for the number of finobt
      fields to the AGI, similar to what we did for the refcount blocks.
      We should do add it next time we rev the AGI or AGF format by adding
      new fields.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      76d771b4
  22. 09 12月, 2016 1 次提交
  23. 05 12月, 2016 1 次提交
    • D
      xfs: make xfs btree stats less huge · 11ef38af
      Dave Chinner 提交于
      Embedding a switch statement in every btree stats inc/add adds a lot
      of code overhead to the core btree infrastructure paths. Stats are
      supposed to be small and lightweight, but the btree stats have
      become big and bloated as we've added more btrees. It needs fixing
      because the reflink code will just add more overhead again.
      
      Convert the v2 btree stats to arrays instead of independent
      variables, and instead use the type to index the specific btree
      array via an enum. This allows us to use array based indexing
      to update the stats, rather than having to derefence variables
      specific to the btree type.
      
      If we then wrap the xfsstats structure in a union and place uint32_t
      array beside it, and calculate the correct btree stats array base
      array index when creating a btree cursor,  we can easily access
      entries in the stats structure without having to switch names based
      on the btree type.
      
      We then replace with the switch statement with a simple set of stats
      wrapper macros, resulting in a significant simplification of the
      btree stats code, and:
      
         text	   data	    bss	    dec	    hex	filename
        48905	    144	      8	  49057	   bfa1	fs/xfs/libxfs/xfs_btree.o.old
        36793	    144	      8	  36945	   9051	fs/xfs/libxfs/xfs_btree.o
      
      it reduces the core btree infrastructure code size by close to 25%!
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      11ef38af
  24. 19 9月, 2016 1 次提交
    • D
      xfs: set up per-AG free space reservations · 3fd129b6
      Darrick J. Wong 提交于
      One unfortunate quirk of the reference count and reverse mapping
      btrees -- they can expand in size when blocks are written to *other*
      allocation groups if, say, one large extent becomes a lot of tiny
      extents.  Since we don't want to start throwing errors in the middle
      of CoWing, we need to reserve some blocks to handle future expansion.
      The transaction block reservation counters aren't sufficient here
      because we have to have a reserve of blocks in every AG, not just
      somewhere in the filesystem.
      
      Therefore, create two per-AG block reservation pools.  One feeds the
      AGFL so that rmapbt expansion always succeeds, and the other feeds all
      other metadata so that refcountbt expansion never fails.
      
      Use the count of how many reserved blocks we need to have on hand to
      create a virtual reservation in the AG.  Through selective clamping of
      the maximum length of allocation requests and of the length of the
      longest free extent, we can make it look like there's less free space
      in the AG unless the reservation owner is asking for blocks.
      
      In other words, play some accounting tricks in-core to make sure that
      we always have blocks available.  On the plus side, there's nothing to
      clean up if we crash, which is contrast to the strategy that the rough
      draft used (actually removing extents from the freespace btrees).
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3fd129b6
  25. 03 8月, 2016 4 次提交
    • D
      xfs: remove the get*keys and update_keys btree ops pointers · 973b8319
      Darrick J. Wong 提交于
      These are internal btree functions; we don't need them to be
      dispatched via function pointers.  Make them static again and
      just check the overlapped flag to figure out what we need to
      do.  The strategy behind this patch was suggested by Christoph.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Suggested-by: NChristoph Hellwig <hch@infradead.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      973b8319
    • D
      xfs: add owner field to extent allocation and freeing · 340785cc
      Darrick J. Wong 提交于
      For the rmap btree to work, we have to feed the extent owner
      information to the the allocation and freeing functions. This
      information is what will end up in the rmap btree that tracks
      allocated extents. While we technically don't need the owner
      information when freeing extents, passing it allows us to validate
      that the extent we are removing from the rmap btree actually
      belonged to the owner we expected it to belong to.
      
      We also define a special set of owner values for internal metadata
      that would otherwise have no owner. This allows us to tell the
      difference between metadata owned by different per-ag btrees, as
      well as static fs metadata (e.g. AG headers) and internal journal
      blocks.
      
      There are also a couple of special cases we need to take care of -
      during EFI recovery, we don't actually know who the original owner
      was, so we need to pass a wildcard to indicate that we aren't
      checking the owner for validity. We also need special handling in
      growfs, as we "free" the space in the last AG when extending it, but
      because it's new space it has no actual owner...
      
      While touching the xfs_bmap_add_free() function, re-order the
      parameters to put the struct xfs_mount first.
      
      Extend the owner field to include both the owner type and some sort
      of index within the owner.  The index field will be used to support
      reverse mappings when reflink is enabled.
      
      When we're freeing extents from an EFI, we don't have the owner
      information available (rmap updates have their own redo items).
      xfs_free_extent therefore doesn't need to do an rmap update. Make
      sure that the log replay code signals this correctly.
      
      This is based upon a patch originally from Dave Chinner. It has been
      extended to add more owner information with the intent of helping
      recovery operations when things go wrong (e.g. offset of user data
      block in a file).
      
      [dchinner: de-shout the xfs_rmap_*_owner helpers]
      [darrick: minor style fixes suggested by Christoph Hellwig]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      340785cc
    • D
      xfs: add function pointers for get/update keys to the btree · 70b22659
      Darrick J. Wong 提交于
      Add some function pointers to bc_ops to get the btree keys for
      leaf and node blocks, and to update parent keys of a block.
      Convert the _btree_updkey calls to use our new pointer, and
      modify the tree shape changing code to call the appropriate
      get_*_keys pointer instead of _btree_copy_keys because the
      overlapping btree has to calculate high key values.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      70b22659
    • D
      xfs: during btree split, save new block key & ptr for future insertion · e5821e57
      Darrick J. Wong 提交于
      When a btree block has to be split, we pass the new block's ptr from
      xfs_btree_split() back to xfs_btree_insert() via a pointer parameter;
      however, we pass the block's key through the cursor's record.  It is a
      little weird to "initialize" a record from a key since the non-key
      attributes will have garbage values.
      
      When we go to add support for interval queries, we have to be able to
      pass the lowest and highest keys accessible via a pointer.  There's no
      clean way to pass this back through the cursor's record field.
      Therefore, pass the key directly back to xfs_btree_insert() the same
      way that we pass the btree_ptr.
      
      As a bonus, we no longer need init_rec_from_key and can drop it from the
      codebase.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e5821e57
  26. 08 2月, 2016 1 次提交
  27. 04 1月, 2016 2 次提交
  28. 29 7月, 2015 1 次提交
    • E
      xfs: create new metadata UUID field and incompat flag · ce748eaa
      Eric Sandeen 提交于
      This adds a new superblock field, sb_meta_uuid.  If set, along with
      a new incompat flag, the code will use that field on a V5 filesystem
      to compare to metadata UUIDs, which allows us to change the user-
      visible UUID at will.  Userspace handles the setting and clearing
      of the incompat flag as appropriate, as the UUID gets changed; i.e.
      setting the user-visible UUID back to the original UUID (as stored in
      the new field) will remove the incompatible feature flag.
      
      If the incompat flag is not set, this copies the user-visible UUID into
      into the meta_uuid slot in memory when the superblock is read from disk;
      the meta_uuid field is not written back to disk in this case.
      
      The remainder of this patch simply switches verifiers, initializers,
      etc to use the new sb_meta_uuid field.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ce748eaa
  29. 29 5月, 2015 1 次提交
    • B
      xfs: allocate sparse inode chunks on full chunk allocation failure · 56d1115c
      Brian Foster 提交于
      xfs_ialloc_ag_alloc() makes several attempts to allocate a full inode
      chunk. If all else fails, reduce the allocation to the sparse length and
      alignment and attempt to allocate a sparse inode chunk.
      
      If sparse chunk allocation succeeds, check whether an inobt record
      already exists that can track the chunk. If so, inherit and update the
      existing record. Otherwise, insert a new record for the sparse chunk.
      
      Create helpers to align sparse chunk inode records and insert or update
      existing records in the inode btrees. The xfs_inobt_insert_sprec()
      helper implements the merge or update semantics required for sparse
      inode records with respect to both the inobt and finobt. To update the
      inobt, either insert a new record or merge with an existing record. To
      update the finobt, use the updated inobt record to either insert or
      replace an existing record.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      56d1115c