1. 06 10月, 2016 1 次提交
    • D
      xfs: store in-progress CoW allocations in the refcount btree · 174edb0e
      Darrick J. Wong 提交于
      Due to the way the CoW algorithm in XFS works, there's an interval
      during which blocks allocated to handle a CoW can be lost -- if the FS
      goes down after the blocks are allocated but before the block
      remapping takes place.  This is exacerbated by the cowextsz hint --
      allocated reservations can sit around for a while, waiting to get
      used.
      
      Since the refcount btree doesn't normally store records with refcount
      of 1, we can use it to record these in-progress extents.  In-progress
      blocks cannot be shared because they're not user-visible, so there
      shouldn't be any conflicts with other programs.  This is a better
      solution than holding EFIs during writeback because (a) EFIs can't be
      relogged currently, (b) even if they could, EFIs are bound by
      available log space, which puts an unnecessary upper bound on how much
      CoW we can have in flight, and (c) we already have a mechanism to
      track blocks.
      
      At mount time, read the refcount records and free anything we find
      with a refcount of 1 because those were in-progress when the FS went
      down.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      174edb0e
  2. 04 10月, 2016 4 次提交
  3. 26 8月, 2016 1 次提交
  4. 17 8月, 2016 1 次提交
  5. 03 8月, 2016 5 次提交
    • D
      xfs: enable the rmap btree functionality · 1c0607ac
      Darrick J. Wong 提交于
      Originally-From: Dave Chinner <dchinner@redhat.com>
      
      Add the feature flag to the supported matrix so that the kernel can
      mount and use rmap btree enabled filesystems
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      [darrick.wong@oracle.com: move the experimental tag]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      1c0607ac
    • D
      xfs: define the on-disk rmap btree format · 035e00ac
      Darrick J. Wong 提交于
      Originally-From: Dave Chinner <dchinner@redhat.com>
      
      Now we have all the surrounding call infrastructure in place, we can
      start filling out the rmap btree implementation. Start with the
      on-disk btree format; add everything needed to read, write and
      manipulate rmap btree blocks. This prepares the way for adding the
      btree operations implementation.
      
      [darrick: record owner and offset info in rmap btree]
      [darrick: fork, bmbt and unwritten state in rmap btree]
      [darrick: flags are a separate field in xfs_rmap_irec]
      [darrick: calculate maxlevels separately]
      [darrick: move the 'unwritten' bit into unused parts of rm_offset]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      035e00ac
    • D
      xfs: add owner field to extent allocation and freeing · 340785cc
      Darrick J. Wong 提交于
      For the rmap btree to work, we have to feed the extent owner
      information to the the allocation and freeing functions. This
      information is what will end up in the rmap btree that tracks
      allocated extents. While we technically don't need the owner
      information when freeing extents, passing it allows us to validate
      that the extent we are removing from the rmap btree actually
      belonged to the owner we expected it to belong to.
      
      We also define a special set of owner values for internal metadata
      that would otherwise have no owner. This allows us to tell the
      difference between metadata owned by different per-ag btrees, as
      well as static fs metadata (e.g. AG headers) and internal journal
      blocks.
      
      There are also a couple of special cases we need to take care of -
      during EFI recovery, we don't actually know who the original owner
      was, so we need to pass a wildcard to indicate that we aren't
      checking the owner for validity. We also need special handling in
      growfs, as we "free" the space in the last AG when extending it, but
      because it's new space it has no actual owner...
      
      While touching the xfs_bmap_add_free() function, re-order the
      parameters to put the struct xfs_mount first.
      
      Extend the owner field to include both the owner type and some sort
      of index within the owner.  The index field will be used to support
      reverse mappings when reflink is enabled.
      
      When we're freeing extents from an EFI, we don't have the owner
      information available (rmap updates have their own redo items).
      xfs_free_extent therefore doesn't need to do an rmap update. Make
      sure that the log replay code signals this correctly.
      
      This is based upon a patch originally from Dave Chinner. It has been
      extended to add more owner information with the intent of helping
      recovery operations when things go wrong (e.g. offset of user data
      block in a file).
      
      [dchinner: de-shout the xfs_rmap_*_owner helpers]
      [darrick: minor style fixes suggested by Christoph Hellwig]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      340785cc
    • D
      xfs: rmap btree add more reserved blocks · 8018026e
      Darrick J. Wong 提交于
      Originally-From: Dave Chinner <dchinner@redhat.com>
      
      XFS reserves a small amount of space in each AG for the minimum
      number of free blocks needed for operation. Adding the rmap btree
      increases the number of reserved blocks, but it also increases the
      complexity of the calculation as the free inode btree is optional
      (like the rmbt).
      
      Rather than calculate the prealloc blocks every time we need to
      check it, add a function to calculate it at mount time and store it
      in the struct xfs_mount, and convert the XFS_PREALLOC_BLOCKS macro
      just to use the xfs-mount variable directly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8018026e
    • D
      xfs: introduce rmap btree definitions · b8704944
      Darrick J. Wong 提交于
      Originally-From: Dave Chinner <dchinner@redhat.com>
      
      Add new per-ag rmap btree definitions to the per-ag structures. The
      rmap btree will sit in the empty slots on disk after the free space
      btrees, and hence form a part of the array of space management
      btrees. This requires the definition of the btree to be contiguous
      with the free space btrees.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b8704944
  6. 20 7月, 2016 1 次提交
  7. 04 1月, 2016 3 次提交
    • D
      xfs: introduce per-inode DAX enablement · 58f88ca2
      Dave Chinner 提交于
      Rather than just being able to turn DAX on and off via a mount
      option, some applications may only want to enable DAX for certain
      performance critical files in a filesystem.
      
      This patch introduces a new inode flag to enable DAX in the v3 inode
      di_flags2 field. It adds support for setting and clearing flags in
      the di_flags2 field via the XFS_IOC_FSSETXATTR ioctl, and sets the
      S_DAX inode flag appropriately when it is seen.
      
      When this flag is set on a directory, it acts as an "inherit flag".
      That is, inodes created in the directory will automatically inherit
      the on-disk inode DAX flag, enabling administrators to set up
      directory heirarchies that automatically use DAX. Setting this flag
      on an empty root directory will make the entire filesystem use DAX
      by default.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      58f88ca2
    • D
      xfs: use FS_XFLAG definitions directly · e7b89481
      Dave Chinner 提交于
      Now that the ioctls have been hoisted up to the VFS level, use
      the VFs definitions directly and remove the XFS specific definitions
      completely. Userspace is going to have to handle the change of this
      interface separately, so removing the definitions from xfs_fs.h is
      not an issue here at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      e7b89481
    • D
      libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct · 96f859d5
      Darrick J. Wong 提交于
      Because struct xfs_agfl is 36 bytes long and has a 64-bit integer
      inside it, gcc will quietly round the structure size up to the nearest
      64 bits -- in this case, 40 bytes.  This results in the XFS_AGFL_SIZE
      macro returning incorrect results for v5 filesystems on 64-bit
      machines (118 items instead of 119).  As a result, a 32-bit xfs_repair
      will see garbage in AGFL item 119 and complain.
      
      Therefore, tell gcc not to pad the structure so that the AGFL size
      calculation is correct.
      
      cc: <stable@vger.kernel.org> # 3.10 - 4.4
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      96f859d5
  8. 03 11月, 2015 1 次提交
  9. 12 10月, 2015 1 次提交
  10. 29 7月, 2015 1 次提交
    • E
      xfs: create new metadata UUID field and incompat flag · ce748eaa
      Eric Sandeen 提交于
      This adds a new superblock field, sb_meta_uuid.  If set, along with
      a new incompat flag, the code will use that field on a V5 filesystem
      to compare to metadata UUIDs, which allows us to change the user-
      visible UUID at will.  Userspace handles the setting and clearing
      of the incompat flag as appropriate, as the UUID gets changed; i.e.
      setting the user-visible UUID back to the original UUID (as stored in
      the new field) will remove the incompatible feature flag.
      
      If the incompat flag is not set, this copies the user-visible UUID into
      into the meta_uuid slot in memory when the superblock is read from disk;
      the meta_uuid field is not written back to disk in this case.
      
      The remainder of this patch simply switches verifiers, initializers,
      etc to use the new sb_meta_uuid field.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ce748eaa
  11. 22 6月, 2015 1 次提交
  12. 01 6月, 2015 1 次提交
    • E
      xfs: don't cast string literals · 39e56d92
      Eric Sandeen 提交于
      The commit:
      
      a9273ca5 xfs: convert attr to use unsigned names
      
      added these (unsigned char *) casts, but then the _SIZE macros
      return "7" - size of a pointer minus one - not the length of
      the string.  This is harmless in the kernel, because the _SIZE
      macros are not used, but as we sync up with userspace, this will
      matter.
      
      I don't think the cast is necessary; i.e. assigning the string
      literal to an unsigned char *, or passing it to a function
      expecting an unsigned char *, should be ok, right?
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      39e56d92
  13. 29 5月, 2015 4 次提交
    • B
      xfs: enable sparse inode chunks for v5 superblocks · 22ce1e14
      Brian Foster 提交于
      Enable mounting of filesystems with sparse inode support enabled. Add
      the incompat. feature bit to the *_ALL mask.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      22ce1e14
    • B
      xfs: introduce inode record hole mask for sparse inode chunks · 5419040f
      Brian Foster 提交于
      The inode btrees track 64 inodes per record regardless of inode size.
      Thus, inode chunks on disk vary in size depending on the size of the
      inodes. This creates a contiguous allocation requirement for new inode
      chunks that can be difficult to satisfy on an aged and fragmented (free
      space) filesystems.
      
      The inode record freecount currently uses 4 bytes on disk to track the
      free inode count. With a maximum freecount value of 64, only one byte is
      required. Convert the freecount field to a single byte and use two of
      the remaining 3 higher order bytes left for the hole mask field. Use the
      final leftover byte for the total count field.
      
      The hole mask field tracks holes in the chunks of physical space that
      the inode record refers to. This facilitates the sparse allocation of
      inode chunks when contiguous chunks are not available and allows the
      inode btrees to identify what portions of the chunk contain valid
      inodes. The total count field contains the total number of valid inodes
      referred to by the record. This can also be deduced from the hole mask.
      The count field provides clarity and redundancy for internal record
      verification.
      
      Note that neither of the new fields can be written to disk on fs'
      without sparse inode support. Doing so writes to the high-order bytes of
      freecount and causes corruption from the perspective of older kernels.
      The on-disk inobt record data structure is updated with a union to
      distinguish between the original, "full" format and the new, "sparse"
      format. The conversion routines to get, insert and update records are
      updated to translate to and from the on-disk record accordingly such
      that freecount remains a 4-byte value on non-supported fs, yet the new
      fields of the in-core record are always valid with respect to the
      record. This means that higher level code can refer to the current
      in-core record format unconditionally and lower level code ensures that
      records are translated to/from disk according to the capabilities of the
      fs.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5419040f
    • B
      xfs: sparse inode chunks feature helpers and mount requirements · e5376fc1
      Brian Foster 提交于
      The sparse inode chunks feature uses the helper function to enable the
      allocation of sparse inode chunks. The incompatible feature bit is set
      on disk at mkfs time to prevent mount from unsupported kernels.
      
      Also, enforce the inode alignment requirements required for sparse inode
      chunks at mount time. When enabled, full inode chunks (and all inode
      record) alignment is increased from cluster size to inode chunk size.
      Sparse inode alignment must match the cluster size of the fs. Both
      superblock alignment fields are set as such by mkfs when sparse inode
      support is enabled.
      
      Finally, warn that sparse inode chunks is an experimental feature until
      further notice.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e5376fc1
    • B
      xfs: add sparse inode chunk alignment superblock field · fb4f2b4e
      Brian Foster 提交于
      Add sb_spino_align to the superblock to specify sparse inode chunk
      alignment. This also currently represents the minimum allowable sparse
      chunk allocation size.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      fb4f2b4e
  14. 23 2月, 2015 1 次提交
  15. 22 1月, 2015 1 次提交
    • D
      xfs: sanitise sb_bad_features2 handling · 074e427b
      Dave Chinner 提交于
      We currently have to ensure that every time we update sb_features2
      that we update sb_bad_features2. Now that we log and format the
      superblock in it's entirety we actually don't have to care because
      we can simply update the sb_bad_features2 when we format it into the
      buffer. This removes the need for anything but the mount and
      superblock formatting code to care about sb_bad_features2, and
      hence removes the possibility that we forget to update bad_features2
      when necessary in the future.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      074e427b
  16. 24 12月, 2014 1 次提交
    • J
      xfs: Keep sb_bad_features2 consistent with sb_features2 · 1a43ec03
      Jan Kara 提交于
      Currently when we modify sb_features2, we store the same value also in
      sb_bad_features2. However in most places we forget to mark field
      sb_bad_features2 for logging and thus it can happen that a change to it
      is lost. This results in an inconsistent sb_features2 and
      sb_bad_features2 fields e.g. after xfstests test xfs/187.
      
      Fix the problem by changing XFS_SB_FEATURES2 to actually mean both
      sb_features2 and sb_bad_features2 fields since this is always what we
      want to log. This isn't ideal because the fact that XFS_SB_FEATURES2
      means two fields could cause some problem in future however the code is
      hopefully less error prone that it is now.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      1a43ec03
  17. 28 11月, 2014 5 次提交
  18. 30 7月, 2014 1 次提交
  19. 25 6月, 2014 1 次提交
  20. 24 4月, 2014 1 次提交
  21. 27 2月, 2014 1 次提交
  22. 24 10月, 2013 2 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: create a shared header file for format-related information · 70a9883c
      Dave Chinner 提交于
      All of the buffer operations structures are needed to be exported
      for xfs_db, so move them all to a common location rather than
      spreading them all over the place. They are verifying the on-disk
      format, so while xfs_format.h might be a good place, it is not part
      of the on disk format.
      
      Hence we need to create a new header file that we centralise these
      related definitions. Start by moving the bffer operations
      structures, and then also move all the other definitions that have
      crept into xfs_log_format.h and xfs_format.h as there was no other
      shared header file to put them in.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      70a9883c
  23. 13 8月, 2013 1 次提交