1. 09 12月, 2016 1 次提交
  2. 05 12月, 2016 1 次提交
    • D
      xfs: make xfs btree stats less huge · 11ef38af
      Dave Chinner 提交于
      Embedding a switch statement in every btree stats inc/add adds a lot
      of code overhead to the core btree infrastructure paths. Stats are
      supposed to be small and lightweight, but the btree stats have
      become big and bloated as we've added more btrees. It needs fixing
      because the reflink code will just add more overhead again.
      
      Convert the v2 btree stats to arrays instead of independent
      variables, and instead use the type to index the specific btree
      array via an enum. This allows us to use array based indexing
      to update the stats, rather than having to derefence variables
      specific to the btree type.
      
      If we then wrap the xfsstats structure in a union and place uint32_t
      array beside it, and calculate the correct btree stats array base
      array index when creating a btree cursor,  we can easily access
      entries in the stats structure without having to switch names based
      on the btree type.
      
      We then replace with the switch statement with a simple set of stats
      wrapper macros, resulting in a significant simplification of the
      btree stats code, and:
      
         text	   data	    bss	    dec	    hex	filename
        48905	    144	      8	  49057	   bfa1	fs/xfs/libxfs/xfs_btree.o.old
        36793	    144	      8	  36945	   9051	fs/xfs/libxfs/xfs_btree.o
      
      it reduces the core btree infrastructure code size by close to 25%!
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      11ef38af
  3. 19 9月, 2016 1 次提交
    • D
      xfs: set up per-AG free space reservations · 3fd129b6
      Darrick J. Wong 提交于
      One unfortunate quirk of the reference count and reverse mapping
      btrees -- they can expand in size when blocks are written to *other*
      allocation groups if, say, one large extent becomes a lot of tiny
      extents.  Since we don't want to start throwing errors in the middle
      of CoWing, we need to reserve some blocks to handle future expansion.
      The transaction block reservation counters aren't sufficient here
      because we have to have a reserve of blocks in every AG, not just
      somewhere in the filesystem.
      
      Therefore, create two per-AG block reservation pools.  One feeds the
      AGFL so that rmapbt expansion always succeeds, and the other feeds all
      other metadata so that refcountbt expansion never fails.
      
      Use the count of how many reserved blocks we need to have on hand to
      create a virtual reservation in the AG.  Through selective clamping of
      the maximum length of allocation requests and of the length of the
      longest free extent, we can make it look like there's less free space
      in the AG unless the reservation owner is asking for blocks.
      
      In other words, play some accounting tricks in-core to make sure that
      we always have blocks available.  On the plus side, there's nothing to
      clean up if we crash, which is contrast to the strategy that the rough
      draft used (actually removing extents from the freespace btrees).
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3fd129b6
  4. 03 8月, 2016 4 次提交
    • D
      xfs: remove the get*keys and update_keys btree ops pointers · 973b8319
      Darrick J. Wong 提交于
      These are internal btree functions; we don't need them to be
      dispatched via function pointers.  Make them static again and
      just check the overlapped flag to figure out what we need to
      do.  The strategy behind this patch was suggested by Christoph.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Suggested-by: NChristoph Hellwig <hch@infradead.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      973b8319
    • D
      xfs: add owner field to extent allocation and freeing · 340785cc
      Darrick J. Wong 提交于
      For the rmap btree to work, we have to feed the extent owner
      information to the the allocation and freeing functions. This
      information is what will end up in the rmap btree that tracks
      allocated extents. While we technically don't need the owner
      information when freeing extents, passing it allows us to validate
      that the extent we are removing from the rmap btree actually
      belonged to the owner we expected it to belong to.
      
      We also define a special set of owner values for internal metadata
      that would otherwise have no owner. This allows us to tell the
      difference between metadata owned by different per-ag btrees, as
      well as static fs metadata (e.g. AG headers) and internal journal
      blocks.
      
      There are also a couple of special cases we need to take care of -
      during EFI recovery, we don't actually know who the original owner
      was, so we need to pass a wildcard to indicate that we aren't
      checking the owner for validity. We also need special handling in
      growfs, as we "free" the space in the last AG when extending it, but
      because it's new space it has no actual owner...
      
      While touching the xfs_bmap_add_free() function, re-order the
      parameters to put the struct xfs_mount first.
      
      Extend the owner field to include both the owner type and some sort
      of index within the owner.  The index field will be used to support
      reverse mappings when reflink is enabled.
      
      When we're freeing extents from an EFI, we don't have the owner
      information available (rmap updates have their own redo items).
      xfs_free_extent therefore doesn't need to do an rmap update. Make
      sure that the log replay code signals this correctly.
      
      This is based upon a patch originally from Dave Chinner. It has been
      extended to add more owner information with the intent of helping
      recovery operations when things go wrong (e.g. offset of user data
      block in a file).
      
      [dchinner: de-shout the xfs_rmap_*_owner helpers]
      [darrick: minor style fixes suggested by Christoph Hellwig]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      340785cc
    • D
      xfs: add function pointers for get/update keys to the btree · 70b22659
      Darrick J. Wong 提交于
      Add some function pointers to bc_ops to get the btree keys for
      leaf and node blocks, and to update parent keys of a block.
      Convert the _btree_updkey calls to use our new pointer, and
      modify the tree shape changing code to call the appropriate
      get_*_keys pointer instead of _btree_copy_keys because the
      overlapping btree has to calculate high key values.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      70b22659
    • D
      xfs: during btree split, save new block key & ptr for future insertion · e5821e57
      Darrick J. Wong 提交于
      When a btree block has to be split, we pass the new block's ptr from
      xfs_btree_split() back to xfs_btree_insert() via a pointer parameter;
      however, we pass the block's key through the cursor's record.  It is a
      little weird to "initialize" a record from a key since the non-key
      attributes will have garbage values.
      
      When we go to add support for interval queries, we have to be able to
      pass the lowest and highest keys accessible via a pointer.  There's no
      clean way to pass this back through the cursor's record field.
      Therefore, pass the key directly back to xfs_btree_insert() the same
      way that we pass the btree_ptr.
      
      As a bonus, we no longer need init_rec_from_key and can drop it from the
      codebase.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e5821e57
  5. 08 2月, 2016 1 次提交
  6. 04 1月, 2016 2 次提交
  7. 29 7月, 2015 1 次提交
    • E
      xfs: create new metadata UUID field and incompat flag · ce748eaa
      Eric Sandeen 提交于
      This adds a new superblock field, sb_meta_uuid.  If set, along with
      a new incompat flag, the code will use that field on a V5 filesystem
      to compare to metadata UUIDs, which allows us to change the user-
      visible UUID at will.  Userspace handles the setting and clearing
      of the incompat flag as appropriate, as the UUID gets changed; i.e.
      setting the user-visible UUID back to the original UUID (as stored in
      the new field) will remove the incompatible feature flag.
      
      If the incompat flag is not set, this copies the user-visible UUID into
      into the meta_uuid slot in memory when the superblock is read from disk;
      the meta_uuid field is not written back to disk in this case.
      
      The remainder of this patch simply switches verifiers, initializers,
      etc to use the new sb_meta_uuid field.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ce748eaa
  8. 29 5月, 2015 3 次提交
    • B
      xfs: allocate sparse inode chunks on full chunk allocation failure · 56d1115c
      Brian Foster 提交于
      xfs_ialloc_ag_alloc() makes several attempts to allocate a full inode
      chunk. If all else fails, reduce the allocation to the sparse length and
      alignment and attempt to allocate a sparse inode chunk.
      
      If sparse chunk allocation succeeds, check whether an inobt record
      already exists that can track the chunk. If so, inherit and update the
      existing record. Otherwise, insert a new record for the sparse chunk.
      
      Create helpers to align sparse chunk inode records and insert or update
      existing records in the inode btrees. The xfs_inobt_insert_sprec()
      helper implements the merge or update semantics required for sparse
      inode records with respect to both the inobt and finobt. To update the
      inobt, either insert a new record or merge with an existing record. To
      update the finobt, use the updated inobt record to either insert or
      replace an existing record.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      56d1115c
    • B
      xfs: helper to convert holemask to inode alloc. bitmap · 4148c347
      Brian Foster 提交于
      The inobt record holemask field is a condensed data type designed to fit
      into the existing on-disk record and is zero based (allocated regions
      are set to 0, sparse regions are set to 1) to provide backwards
      compatibility. This makes the type somewhat complex for use in higher
      level inode manipulations such as individual inode allocation, etc.
      
      Rather than foist the complexity of dealing with this field to every bit
      of logic that requires inode granular information, create a helper to
      convert the holemask to an inode allocation bitmap. The inode allocation
      bitmap is inode granularity similar to the inobt record free mask and
      indicates which inodes of the chunk are physically allocated on disk,
      irrespective of whether the inode is considered allocated or free by the
      filesystem.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4148c347
    • B
      xfs: introduce inode record hole mask for sparse inode chunks · 5419040f
      Brian Foster 提交于
      The inode btrees track 64 inodes per record regardless of inode size.
      Thus, inode chunks on disk vary in size depending on the size of the
      inodes. This creates a contiguous allocation requirement for new inode
      chunks that can be difficult to satisfy on an aged and fragmented (free
      space) filesystems.
      
      The inode record freecount currently uses 4 bytes on disk to track the
      free inode count. With a maximum freecount value of 64, only one byte is
      required. Convert the freecount field to a single byte and use two of
      the remaining 3 higher order bytes left for the hole mask field. Use the
      final leftover byte for the total count field.
      
      The hole mask field tracks holes in the chunks of physical space that
      the inode record refers to. This facilitates the sparse allocation of
      inode chunks when contiguous chunks are not available and allows the
      inode btrees to identify what portions of the chunk contain valid
      inodes. The total count field contains the total number of valid inodes
      referred to by the record. This can also be deduced from the hole mask.
      The count field provides clarity and redundancy for internal record
      verification.
      
      Note that neither of the new fields can be written to disk on fs'
      without sparse inode support. Doing so writes to the high-order bytes of
      freecount and causes corruption from the perspective of older kernels.
      The on-disk inobt record data structure is updated with a union to
      distinguish between the original, "full" format and the new, "sparse"
      format. The conversion routines to get, insert and update records are
      updated to translate to and from the on-disk record accordingly such
      that freecount remains a 4-byte value on non-supported fs, yet the new
      fields of the in-core record are always valid with respect to the
      record. This means that higher level code can refer to the current
      in-core record format unconditionally and lower level code ensures that
      records are translated to/from disk according to the capabilities of the
      fs.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5419040f
  9. 28 11月, 2014 2 次提交
  10. 25 6月, 2014 2 次提交
  11. 24 4月, 2014 2 次提交
  12. 14 4月, 2014 1 次提交
  13. 27 2月, 2014 2 次提交
  14. 31 10月, 2013 1 次提交
  15. 24 10月, 2013 2 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
  16. 08 5月, 2013 1 次提交
    • D
      xfs: introduce CONFIG_XFS_WARN · 742ae1e3
      Dave Chinner 提交于
      Running a CONFIG_XFS_DEBUG kernel in production environments is not
      the best idea as it introduces significant overhead, can change
      the behaviour of algorithms (such as allocation) to improve test
      coverage, and (most importantly) panic the machine on non-fatal
      errors.
      
      There are many cases where all we want to do is run a
      kernel with more bounds checking enabled, such as is provided by the
      ASSERT() statements throughout the code, but without all the
      potential overhead and drawbacks.
      
      This patch converts all the ASSERT statements to evaluate as
      WARN_ON(1) statements and hence if they fail dump a warning and a
      stack trace to the log. This has minimal overhead and does not
      change any algorithms, and will allow us to find strange "out of
      bounds" problems more easily on production machines.
      
      There are a few places where assert statements contain debug only
      code. These are converted to be debug-or-warn only code so that we
      still get all the assert checks in the code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      742ae1e3
  17. 22 4月, 2013 1 次提交
    • C
      xfs: add support for large btree blocks · ee1a47ab
      Christoph Hellwig 提交于
      Add support for larger btree blocks that contains a CRC32C checksum,
      a filesystem uuid and block number for detecting filesystem
      consistency and out of place writes.
      
      [dchinner@redhat.com] Also include an owner field to allow reverse
      mappings to be implemented for improved repairability and a LSN
      field to so that log recovery can easily determine the last
      modification that made it to disk for each buffer.
      
      [dchinner@redhat.com] Add buffer log format flags to indicate the
      type of buffer to recovery so that we don't have to do blind magic
      number tests to determine what the buffer is.
      
      [dchinner@redhat.com] Modified to fit into the verifier structure.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ee1a47ab
  18. 16 11月, 2012 4 次提交
    • D
      xfs: convert buffer verifiers to an ops structure. · 1813dd64
      Dave Chinner 提交于
      To separate the verifiers from iodone functions and associate read
      and write verifiers at the same time, introduce a buffer verifier
      operations structure to the xfs_buf.
      
      This avoids the need for assigning the write verifier, clearing the
      iodone function and re-running ioend processing in the read
      verifier, and gets rid of the nasty "b_pre_io" name for the write
      verifier function pointer. If we ever need to, it will also be
      easier to add further content specific callbacks to a buffer with an
      ops structure in place.
      
      We also avoid needing to export verifier functions, instead we
      can simply export the ops structures for those that are needed
      outside the function they are defined in.
      
      This patch also fixes a directory block readahead verifier issue
      it exposed.
      
      This patch also adds ops callbacks to the inode/alloc btree blocks
      initialised by growfs. These will need more work before they will
      work with CRCs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1813dd64
    • D
      xfs: connect up write verifiers to new buffers · b0f539de
      Dave Chinner 提交于
      Metadata buffers that are read from disk have write verifiers
      already attached to them, but newly allocated buffers do not. Add
      appropriate write verifiers to all new metadata buffers.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b0f539de
    • D
      xfs: add pre-write metadata buffer verifier callbacks · 612cfbfe
      Dave Chinner 提交于
      These verifiers are essentially the same code as the read verifiers,
      but do not require ioend processing. Hence factor the read verifier
      functions and add a new write verifier wrapper that is used as the
      callback.
      
      This is done as one large patch for all verifiers rather than one
      patch per verifier as the change is largely mechanical. This
      includes hooking up the write verifier via the read verifier
      function.
      
      Hooking up the write verifier for buffers obtained via
      xfs_trans_get_buf() will be done in a separate patch as that touches
      code in many different places rather than just the verifier
      functions.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      612cfbfe
    • D
      xfs: verify btree blocks as they are read from disk · 3d3e6f64
      Dave Chinner 提交于
      Add an btree block verify callback function and pass it into the
      buffer read functions. Because each different btree block type
      requires different verification, add a function to the ops structure
      that is called from the generic code.
      
      Also, propagate the verification callback functions through the
      readahead functions, and into the external bmap and bulkstat inode
      readahead code that uses the generic btree buffer read functions.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3d3e6f64
  19. 15 5月, 2012 1 次提交
  20. 13 7月, 2011 1 次提交
  21. 19 10月, 2010 1 次提交
    • C
      xfs: remove the ->kill_root btree operation · c0e59e1a
      Christoph Hellwig 提交于
      The implementation os ->kill_root only differ by either simply
      zeroing out the now unused buffer in the btree cursor in the inode
      allocation btree or using xfs_btree_setbuf in the allocation btree.
      
      Initially both of them used xfs_btree_setbuf, but the use in the
      ialloc btree was removed early on because it interacted badly with
      xfs_trans_binval.
      
      In addition to zeroing out the buffer in the cursor xfs_btree_setbuf
      updates the bc_ra array in the btree cursor, and calls
      xfs_trans_brelse on the buffer previous occupying the slot.
      
      The bc_ra update should be done for the alloc btree updated too,
      although the lack of it does not cause serious problems.  The
      xfs_trans_brelse call on the other hand is effectively a no-op in
      the end - it keeps decrementing the bli_recur refcount until it hits
      zero, and then just skips out because the buffer will always be
      dirty at this point.  So removing it for the allocation btree is
      just fine.
      
      So unify the code and move it to xfs_btree.c.  While we're at it
      also replace the call to xfs_btree_setbuf with a NULL bp argument in
      xfs_btree_del_cursor with a direct call to xfs_trans_brelse given
      that the cursor is beeing freed just after this and the state
      updates are superflous.  After this xfs_btree_setbuf is only used
      with a non-NULL bp argument and can thus be simplified.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c0e59e1a
  22. 27 7月, 2010 2 次提交
  23. 29 3月, 2009 1 次提交
  24. 30 10月, 2008 2 次提交
    • C
      [XFS] Always use struct xfs_btree_block instead of short / longform · 7cc95a82
      Christoph Hellwig 提交于
      structures.
      
      Always use the generic xfs_btree_block type instead of the short / long
      structures. Add XFS_BTREE_SBLOCK_LEN / XFS_BTREE_LBLOCK_LEN defines for
      the length of a short / long form block. The rationale for this is that we
      will grow more btree block header variants to support CRCs and other RAS
      information, and always accessing them through the same datatype with
      unions for the short / long form pointers makes implementing this much
      easier.
      
      SGI-PV: 988146
      
      SGI-Modid: xfs-linux-melb:xfs-kern:32300a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NDavid Chinner <david@fromorbit.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      7cc95a82
    • C
      [XFS] Cleanup maxrecs calculation. · 60197e8d
      Christoph Hellwig 提交于
      Clean up the way the maximum and minimum records for the btree blocks are
      calculated. For the alloc and inobt btrees all the values are
      pre-calculated in xfs_mount_common, and we switch the current loop around
      the ugly generic macros that use cpp token pasting to generate type names
      to two small helpers in normal C code. For the bmbt and bmdr trees these
      helpers also exist, but can be called during runtime, too. Here we also
      kill various macros dealing with them and inline the logic into the
      get_minrecs / get_maxrecs / get_dmaxrecs methods in xfs_bmap_btree.c.
      
      Note that all these new helpers take an xfs_mount * argument which will be
      needed to determine the size of a btree block once we add support for
      extended btree blocks with CRCs and other RAS information.
      
      SGI-PV: 988146
      
      SGI-Modid: xfs-linux-melb:xfs-kern:32292a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      60197e8d