1. 03 8月, 2016 3 次提交
  2. 20 7月, 2016 1 次提交
  3. 21 6月, 2016 1 次提交
    • D
      xfs: refactor btree maxlevels computation · 19b54ee6
      Darrick J. Wong 提交于
      Create a common function to calculate the maximum height of a per-AG
      btree.  This will eventually be used by the rmapbt and refcountbt
      code to calculate appropriate maxlevels values for each.  This is
      important because the verifiers and the transaction block
      reservations depend on accurate estimates of how many blocks are
      needed to satisfy a btree split.
      
      We were mistakenly using the max bnobt height for all the btrees,
      which creates a dangerous situation since the larger records and
      keys in an rmapbt make it very possible that the rmapbt will be
      taller than the bnobt and so we can run out of transaction block
      reservation.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      19b54ee6
  4. 08 2月, 2016 3 次提交
  5. 04 1月, 2016 1 次提交
  6. 12 10月, 2015 2 次提交
    • G
      libxfs: fix two comment typos · fef4ded8
      Geliang Tang 提交于
      Just fix two typos in code comments.
      Signed-off-by: NGeliang Tang <geliangtang@163.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fef4ded8
    • B
      xfs: validate metadata LSNs against log on v5 superblocks · a45086e2
      Brian Foster 提交于
      Since the onset of v5 superblocks, the LSN of the last modification has
      been included in a variety of on-disk data structures. This LSN is used
      to provide log recovery ordering guarantees (e.g., to ensure an older
      log recovery item is not replayed over a newer target data structure).
      
      While this works correctly from the point a filesystem is formatted and
      mounted, userspace tools have some problematic behaviors that defeat
      this mechanism. For example, xfs_repair historically zeroes out the log
      unconditionally (regardless of whether corruption is detected). If this
      occurs, the LSN of the filesystem is reset and the log is now in a
      problematic state with respect to on-disk metadata structures that might
      have a larger LSN. Until either the log catches up to the highest
      previously used metadata LSN or each affected data structure is modified
      and written out without incident (which resets the metadata LSN), log
      recovery is susceptible to filesystem corruption.
      
      This problem is ultimately addressed and repaired in the associated
      userspace tools. The kernel is still responsible to detect the problem
      and notify the user that something is wrong. Check the superblock LSN at
      mount time and fail the mount if it is invalid. From that point on,
      trigger verifier failure on any metadata I/O where an invalid LSN is
      detected. This results in a filesystem shutdown and guarantees that we
      do not log metadata changes with invalid LSNs on disk. Since this is a
      known issue with a known recovery path, present a warning to instruct
      the user how to recover.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a45086e2
  7. 29 7月, 2015 1 次提交
    • E
      xfs: create new metadata UUID field and incompat flag · ce748eaa
      Eric Sandeen 提交于
      This adds a new superblock field, sb_meta_uuid.  If set, along with
      a new incompat flag, the code will use that field on a V5 filesystem
      to compare to metadata UUIDs, which allows us to change the user-
      visible UUID at will.  Userspace handles the setting and clearing
      of the incompat flag as appropriate, as the UUID gets changed; i.e.
      setting the user-visible UUID back to the original UUID (as stored in
      the new field) will remove the incompatible feature flag.
      
      If the incompat flag is not set, this copies the user-visible UUID into
      into the meta_uuid slot in memory when the superblock is read from disk;
      the meta_uuid field is not written back to disk in this case.
      
      The remainder of this patch simply switches verifiers, initializers,
      etc to use the new sb_meta_uuid field.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ce748eaa
  8. 23 2月, 2015 2 次提交
  9. 28 11月, 2014 2 次提交
  10. 30 7月, 2014 1 次提交
  11. 25 6月, 2014 2 次提交
  12. 22 6月, 2014 1 次提交
  13. 06 6月, 2014 1 次提交
  14. 24 4月, 2014 1 次提交
  15. 14 4月, 2014 2 次提交
  16. 27 2月, 2014 2 次提交
  17. 24 10月, 2013 3 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
    • D
      xfs: create a shared header file for format-related information · 70a9883c
      Dave Chinner 提交于
      All of the buffer operations structures are needed to be exported
      for xfs_db, so move them all to a common location rather than
      spreading them all over the place. They are verifying the on-disk
      format, so while xfs_format.h might be a good place, it is not part
      of the on disk format.
      
      Hence we need to create a new header file that we centralise these
      related definitions. Start by moving the bffer operations
      structures, and then also move all the other definitions that have
      crept into xfs_log_format.h and xfs_format.h as there was no other
      shared header file to put them in.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      70a9883c
  18. 11 9月, 2013 1 次提交
    • D
      xfs: recovery of swap extents operations for CRC filesystems · 638f4416
      Dave Chinner 提交于
      This is the recovery side of the btree block owner change operation
      performed by swapext on CRC enabled filesystems. We detect that an
      owner change is needed by the flag that has been placed on the inode
      log format flag field. Because the inode recovery is being replayed
      after the buffers that make up the BMBT in the given checkpoint, we
      can walk all the buffers and directly modify them when we see the
      flag set on an inode.
      
      Because the inode can be relogged and hence present in multiple
      chekpoints with the "change owner" flag set, we could do multiple
      passes across the inode to do this change. While this isn't optimal,
      we can't directly ignore the flag as there may be multiple
      independent swap extent operations being replayed on the same inode
      in different checkpoints so we can't ignore them.
      
      Further, because the owner change operation uses ordered buffers, we
      might have buffers that are newer on disk than the current
      checkpoint and so already have the owner changed in them. Hence we
      cannot just peek at a buffer in the tree and check that it has the
      correct owner and assume that the change was completed.
      
      So, for the moment just brute force the owner change every time we
      see an inode with the flag set. Note that we have to be careful here
      because the owner of the buffers may point to either the old owner
      or the new owner. Currently the verifier can't verify the owner
      directly, so there is no failure case here right now. If we verify
      the owner exactly in future, then we'll have to take this into
      account.
      
      This was tested in terms of normal operation via xfstests - all of
      the fsr tests now pass without failure. however, we really need to
      modify xfs/227 to stress v3 inodes correctly to ensure we fully
      cover this case for v5 filesystems.
      
      In terms of recovery testing, I used a hacked version of xfs_fsr
      that held the temp inode open for a few seconds before exiting so
      that the filesystem could be shut down with an open owner change
      recovery flags set on at least the temp inode. fsr leaves the temp
      inode unlinked and in btree format, so this was necessary for the
      owner change to be reliably replayed.
      
      logprint confirmed the tmp inode in the log had the correct flag set:
      
      INO: cnt:3 total:3 a:0x69e9e0 len:56 a:0x69ea20 len:176 a:0x69eae0 len:88
              INODE: #regs:3   ino:0x44  flags:0x209   dsize:88
      	                                 ^^^^^
      
      0x200 is set, indicating a data fork owner change needed to be
      replayed on inode 0x44.  A printk in the revoery code confirmed that
      the inode change was recovered:
      
      XFS (vdc): Mounting Filesystem
      XFS (vdc): Starting recovery (logdev: internal)
      recovering owner change ino 0x44
      XFS (vdc): Version 5 superblock detected. This kernel L support enabled!
      Use of these features in this kernel is at your own risk!
      XFS (vdc): Ending recovery (logdev: internal)
      
      The script used to test this was:
      
      $ cat ./recovery-fsr.sh
      #!/bin/bash
      
      dev=/dev/vdc
      mntpt=/mnt/scratch
      testfile=$mntpt/testfile
      
      umount $mntpt
      mkfs.xfs -f -m crc=1 $dev
      mount $dev $mntpt
      chmod 777 $mntpt
      
      for i in `seq 10000 -1 0`; do
              xfs_io -f -d -c "pwrite $(($i * 4096)) 4096" $testfile > /dev/null 2>&1
      done
      xfs_bmap -vp $testfile |head -20
      
      xfs_fsr -d -v $testfile &
      sleep 10
      /home/dave/src/xfstests-dev/src/godown -f $mntpt
      wait
      umount $mntpt
      
      xfs_logprint -t $dev |tail -20
      time mount $dev $mntpt
      xfs_bmap -vp $testfile
      umount $mntpt
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      638f4416
  19. 10 9月, 2013 1 次提交
    • D
      xfs: swap extents operations for CRC filesystems · 21b5c978
      Dave Chinner 提交于
      For CRC enabled filesystems, we can't just swap inode forks from one
      inode to another when defragmenting a file - the blocks in the inode
      fork bmap btree contain pointers back to the owner inode. Hence if
      we are to swap the inode forks we have to atomically modify every
      block in the btree during the transaction.
      
      We are doing an entire fork swap here, so we could create a new
      transaction item type that indicates we are changing the owner of a
      certain structure from one value to another. If we combine this with
      ordered buffer logging to modify all the buffers in the tree, then
      we can change the buffers in the tree without needing log space for
      the operation. However, this then requires log recovery to perform
      the modification of the owner information of the objects/structures
      in question.
      
      This does introduce some interesting ordering details into recovery:
      we have to make sure that the owner change replay occurs after the
      change that moves the objects is made, not before. Hence we can't
      use a separate log item for this as we have no guarantee of strict
      ordering between multiple items in the log due to the relogging
      action of asynchronous transaction commits. Hence there is no
      "generic" method we can use for changing the ownership of arbitrary
      metadata structures.
      
      For inode forks, however, there is a simple method of communicating
      that the fork contents need the owner rewritten - we can pass a
      inode log format flag for the fork for the transaction that does a
      fork swap. This flag will then follow the inode fork through
      relogging actions so when the swap actually gets replayed the
      ownership can be changed immediately by log recovery.  So that gives
      us a simple method of "whole fork" exchange between two inodes.
      
      This is relatively simple to implement, so it makes sense to do this
      as an initial implementation to support xfs_fsr on CRC enabled
      filesytems in the same manner as we do on existing filesystems. This
      commit introduces the swapext driven functionality, the recovery
      functionality will be in a separate patch.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      21b5c978
  20. 31 8月, 2013 1 次提交
    • D
      xfs: btree block LSN escaping to disk uninitialised · b58fa554
      Dave Chinner 提交于
      When testing LSN ordering code for v5 superblocks, it was discovered
      that the the LSN embedded in the generic btree blocks was
      occasionally uninitialised. These values didn't get written to disk
      by metadata writeback - they got written by previous transactions in
      log recovery.
      
      The issue is here that the when the block is first allocated and
      initialised, the LSN field was not initialised - it gets overwritten
      before IO is issued on the buffer - but the value that is logged by
      transactions that modify the header before it is written to disk
      (and initialised) contain garbage. Hence the first recovery of the
      buffer will stamp garbage into the LSN field, and that can cause
      subsequent transactions to not replay correctly.
      
      The fix is simply to initialise the bb_lsn field to zero when we
      initialise the block for the first time.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b58fa554
  21. 21 8月, 2013 3 次提交
  22. 15 6月, 2013 1 次提交
    • D
      xfs: ensure btree root split sets blkno correctly · 088c9f67
      Dave Chinner 提交于
      For CRC enabled filesystems, the BMBT is rooted in an inode, so it
      passes through a different code path on root splits than the
      freespace and inode btrees. This is much less traversed by xfstests
      than the other trees. When testing on a 1k block size filesystem,
      I've been seeing ASSERT failures in generic/234 like:
      
      XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_private.b.allocated == 0, file: fs/xfs/xfs_btree.c, line: 317
      
      which are generally preceded by a lblock check failure. I noticed
      this in the bmbt stats:
      
      $ pminfo -f xfs.btree.block_map
      
      xfs.btree.block_map.lookup
          value 39135
      
      xfs.btree.block_map.compare
          value 268432
      
      xfs.btree.block_map.insrec
          value 15786
      
      xfs.btree.block_map.delrec
          value 13884
      
      xfs.btree.block_map.newroot
          value 2
      
      xfs.btree.block_map.killroot
          value 0
      .....
      
      Very little coverage of root splits and merges. Indeed, on a 4k
      filesystem, block_map.newroot and block_map.killroot are both zero.
      i.e. the code is not exercised at all, and it's the only generic
      btree infrastructure operation that is not exercised by a default run
      of xfstests.
      
      Turns out that on a 1k filesystem, generic/234 accounts for one of
      those two root splits, and that is somewhat of a smoking gun. In
      fact, it's the same problem we saw in the directory/attr code where
      headers are memcpy()d from one block to another without updating the
      self describing metadata.
      
      Simple fix - when copying the header out of the root block, make
      sure the block number is updated correctly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit ade1335a)
      088c9f67
  23. 14 6月, 2013 1 次提交
    • D
      xfs: ensure btree root split sets blkno correctly · ade1335a
      Dave Chinner 提交于
      For CRC enabled filesystems, the BMBT is rooted in an inode, so it
      passes through a different code path on root splits than the
      freespace and inode btrees. This is much less traversed by xfstests
      than the other trees. When testing on a 1k block size filesystem,
      I've been seeing ASSERT failures in generic/234 like:
      
      XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_private.b.allocated == 0, file: fs/xfs/xfs_btree.c, line: 317
      
      which are generally preceded by a lblock check failure. I noticed
      this in the bmbt stats:
      
      $ pminfo -f xfs.btree.block_map
      
      xfs.btree.block_map.lookup
          value 39135
      
      xfs.btree.block_map.compare
          value 268432
      
      xfs.btree.block_map.insrec
          value 15786
      
      xfs.btree.block_map.delrec
          value 13884
      
      xfs.btree.block_map.newroot
          value 2
      
      xfs.btree.block_map.killroot
          value 0
      .....
      
      Very little coverage of root splits and merges. Indeed, on a 4k
      filesystem, block_map.newroot and block_map.killroot are both zero.
      i.e. the code is not exercised at all, and it's the only generic
      btree infrastructure operation that is not exercised by a default run
      of xfstests.
      
      Turns out that on a 1k filesystem, generic/234 accounts for one of
      those two root splits, and that is somewhat of a smoking gun. In
      fact, it's the same problem we saw in the directory/attr code where
      headers are memcpy()d from one block to another without updating the
      self describing metadata.
      
      Simple fix - when copying the header out of the root block, make
      sure the block number is updated correctly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ade1335a
  24. 28 4月, 2013 1 次提交
    • D
      xfs: buffer type overruns blf_flags field · 61fe135c
      Dave Chinner 提交于
      The buffer type passed to log recvoery in the buffer log item
      overruns the blf_flags field. I had assumed that flags field was a
      32 bit value, and it turns out it is a unisgned short. Therefore
      having 19 flags doesn't really work.
      
      Convert the buffer type field to numeric value, and use the top 5
      bits of the flags field for it. We currently have 17 types of
      buffers, so using 5 bits gives us plenty of room for expansion in
      future....
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      61fe135c
  25. 22 4月, 2013 1 次提交
    • C
      xfs: add support for large btree blocks · ee1a47ab
      Christoph Hellwig 提交于
      Add support for larger btree blocks that contains a CRC32C checksum,
      a filesystem uuid and block number for detecting filesystem
      consistency and out of place writes.
      
      [dchinner@redhat.com] Also include an owner field to allow reverse
      mappings to be implemented for improved repairability and a LSN
      field to so that log recovery can easily determine the last
      modification that made it to disk for each buffer.
      
      [dchinner@redhat.com] Add buffer log format flags to indicate the
      type of buffer to recovery so that we don't have to do blind magic
      number tests to determine what the buffer is.
      
      [dchinner@redhat.com] Modified to fit into the verifier structure.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ee1a47ab
  26. 16 11月, 2012 1 次提交
    • D
      xfs: convert buffer verifiers to an ops structure. · 1813dd64
      Dave Chinner 提交于
      To separate the verifiers from iodone functions and associate read
      and write verifiers at the same time, introduce a buffer verifier
      operations structure to the xfs_buf.
      
      This avoids the need for assigning the write verifier, clearing the
      iodone function and re-running ioend processing in the read
      verifier, and gets rid of the nasty "b_pre_io" name for the write
      verifier function pointer. If we ever need to, it will also be
      easier to add further content specific callbacks to a buffer with an
      ops structure in place.
      
      We also avoid needing to export verifier functions, instead we
      can simply export the ops structures for those that are needed
      outside the function they are defined in.
      
      This patch also fixes a directory block readahead verifier issue
      it exposed.
      
      This patch also adds ops callbacks to the inode/alloc btree blocks
      initialised by growfs. These will need more work before they will
      work with CRCs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1813dd64