1. 27 10月, 2017 3 次提交
  2. 26 4月, 2017 1 次提交
    • C
      xfs: simplify validation of the unwritten extent bit · 0c1d9e4a
      Christoph Hellwig 提交于
      XFS only supports the unwritten extent bit in the data fork, and only if
      the file system has a version 5 superblock or the unwritten extent
      feature bit.
      
      We currently have two routines that validate the invariant:
      xfs_check_nostate_extents which return -EFSCORRUPTED when it's not met,
      and xfs_validate_extent that triggers and assert in debug build.
      
      Both of them iterate over all extents of an inode fork when called,
      which isn't very efficient.
      
      This patch instead adds a new helper that verifies the invariant one
      extent at a time, and calls it from the places where we iterate over
      all extents to converted them from or two the in-memory format.  The
      callers then return -EFSCORRUPTED when reading invalid extents from
      disk, or trigger an assert when writing them to disk.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0c1d9e4a
  3. 04 4月, 2017 1 次提交
  4. 25 6月, 2014 1 次提交
  5. 14 4月, 2014 1 次提交
  6. 24 10月, 2013 2 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: create a shared header file for format-related information · 70a9883c
      Dave Chinner 提交于
      All of the buffer operations structures are needed to be exported
      for xfs_db, so move them all to a common location rather than
      spreading them all over the place. They are verifying the on-disk
      format, so while xfs_format.h might be a good place, it is not part
      of the on disk format.
      
      Hence we need to create a new header file that we centralise these
      related definitions. Start by moving the bffer operations
      structures, and then also move all the other definitions that have
      crept into xfs_log_format.h and xfs_format.h as there was no other
      shared header file to put them in.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      70a9883c
  7. 11 9月, 2013 1 次提交
    • D
      xfs: recovery of swap extents operations for CRC filesystems · 638f4416
      Dave Chinner 提交于
      This is the recovery side of the btree block owner change operation
      performed by swapext on CRC enabled filesystems. We detect that an
      owner change is needed by the flag that has been placed on the inode
      log format flag field. Because the inode recovery is being replayed
      after the buffers that make up the BMBT in the given checkpoint, we
      can walk all the buffers and directly modify them when we see the
      flag set on an inode.
      
      Because the inode can be relogged and hence present in multiple
      chekpoints with the "change owner" flag set, we could do multiple
      passes across the inode to do this change. While this isn't optimal,
      we can't directly ignore the flag as there may be multiple
      independent swap extent operations being replayed on the same inode
      in different checkpoints so we can't ignore them.
      
      Further, because the owner change operation uses ordered buffers, we
      might have buffers that are newer on disk than the current
      checkpoint and so already have the owner changed in them. Hence we
      cannot just peek at a buffer in the tree and check that it has the
      correct owner and assume that the change was completed.
      
      So, for the moment just brute force the owner change every time we
      see an inode with the flag set. Note that we have to be careful here
      because the owner of the buffers may point to either the old owner
      or the new owner. Currently the verifier can't verify the owner
      directly, so there is no failure case here right now. If we verify
      the owner exactly in future, then we'll have to take this into
      account.
      
      This was tested in terms of normal operation via xfstests - all of
      the fsr tests now pass without failure. however, we really need to
      modify xfs/227 to stress v3 inodes correctly to ensure we fully
      cover this case for v5 filesystems.
      
      In terms of recovery testing, I used a hacked version of xfs_fsr
      that held the temp inode open for a few seconds before exiting so
      that the filesystem could be shut down with an open owner change
      recovery flags set on at least the temp inode. fsr leaves the temp
      inode unlinked and in btree format, so this was necessary for the
      owner change to be reliably replayed.
      
      logprint confirmed the tmp inode in the log had the correct flag set:
      
      INO: cnt:3 total:3 a:0x69e9e0 len:56 a:0x69ea20 len:176 a:0x69eae0 len:88
              INODE: #regs:3   ino:0x44  flags:0x209   dsize:88
      	                                 ^^^^^
      
      0x200 is set, indicating a data fork owner change needed to be
      replayed on inode 0x44.  A printk in the revoery code confirmed that
      the inode change was recovered:
      
      XFS (vdc): Mounting Filesystem
      XFS (vdc): Starting recovery (logdev: internal)
      recovering owner change ino 0x44
      XFS (vdc): Version 5 superblock detected. This kernel L support enabled!
      Use of these features in this kernel is at your own risk!
      XFS (vdc): Ending recovery (logdev: internal)
      
      The script used to test this was:
      
      $ cat ./recovery-fsr.sh
      #!/bin/bash
      
      dev=/dev/vdc
      mntpt=/mnt/scratch
      testfile=$mntpt/testfile
      
      umount $mntpt
      mkfs.xfs -f -m crc=1 $dev
      mount $dev $mntpt
      chmod 777 $mntpt
      
      for i in `seq 10000 -1 0`; do
              xfs_io -f -d -c "pwrite $(($i * 4096)) 4096" $testfile > /dev/null 2>&1
      done
      xfs_bmap -vp $testfile |head -20
      
      xfs_fsr -d -v $testfile &
      sleep 10
      /home/dave/src/xfstests-dev/src/godown -f $mntpt
      wait
      umount $mntpt
      
      xfs_logprint -t $dev |tail -20
      time mount $dev $mntpt
      xfs_bmap -vp $testfile
      umount $mntpt
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      638f4416
  8. 10 9月, 2013 1 次提交
    • D
      xfs: swap extents operations for CRC filesystems · 21b5c978
      Dave Chinner 提交于
      For CRC enabled filesystems, we can't just swap inode forks from one
      inode to another when defragmenting a file - the blocks in the inode
      fork bmap btree contain pointers back to the owner inode. Hence if
      we are to swap the inode forks we have to atomically modify every
      block in the btree during the transaction.
      
      We are doing an entire fork swap here, so we could create a new
      transaction item type that indicates we are changing the owner of a
      certain structure from one value to another. If we combine this with
      ordered buffer logging to modify all the buffers in the tree, then
      we can change the buffers in the tree without needing log space for
      the operation. However, this then requires log recovery to perform
      the modification of the owner information of the objects/structures
      in question.
      
      This does introduce some interesting ordering details into recovery:
      we have to make sure that the owner change replay occurs after the
      change that moves the objects is made, not before. Hence we can't
      use a separate log item for this as we have no guarantee of strict
      ordering between multiple items in the log due to the relogging
      action of asynchronous transaction commits. Hence there is no
      "generic" method we can use for changing the ownership of arbitrary
      metadata structures.
      
      For inode forks, however, there is a simple method of communicating
      that the fork contents need the owner rewritten - we can pass a
      inode log format flag for the fork for the transaction that does a
      fork swap. This flag will then follow the inode fork through
      relogging actions so when the swap actually gets replayed the
      ownership can be changed immediately by log recovery.  So that gives
      us a simple method of "whole fork" exchange between two inodes.
      
      This is relatively simple to implement, so it makes sense to do this
      as an initial implementation to support xfs_fsr on CRC enabled
      filesytems in the same manner as we do on existing filesystems. This
      commit introduces the swapext driven functionality, the recovery
      functionality will be in a separate patch.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      21b5c978
  9. 21 6月, 2013 1 次提交
    • E
      xfs: check on-disk (not incore) btree root size in dfrag.c · 427d9fe2
      Eric Sandeen 提交于
      xfs_swap_extents_check_format() contains checks to make sure that
      original and the temporary files during defrag are compatible;
      Gabriel VLASIU ran into a case where xfs_fsr returned EINVAL
      because the tests found the btree root to be of size 120,
      while the fork offset was only 104; IOW, they overlapped.
      
      However, this is just due to an error in the
      xfs_swap_extents_check_format() tests, because it is checking
      the in-memory btree root size against the on-disk fork offset.
      We should be checking the on-disk sizes in both cases.
      
      This patch adds a new macro to calculate this size, and uses
      it in the tests.
      
      With this change, the filesystem image provided by Gabriel
      allows for proper file degragmentation.
      Reported-by: NGabriel VLASIU <gabriel@vlasiu.net>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      427d9fe2
  10. 22 4月, 2013 1 次提交
    • C
      xfs: add support for large btree blocks · ee1a47ab
      Christoph Hellwig 提交于
      Add support for larger btree blocks that contains a CRC32C checksum,
      a filesystem uuid and block number for detecting filesystem
      consistency and out of place writes.
      
      [dchinner@redhat.com] Also include an owner field to allow reverse
      mappings to be implemented for improved repairability and a LSN
      field to so that log recovery can easily determine the last
      modification that made it to disk for each buffer.
      
      [dchinner@redhat.com] Add buffer log format flags to indicate the
      type of buffer to recovery so that we don't have to do blind magic
      number tests to determine what the buffer is.
      
      [dchinner@redhat.com] Modified to fit into the verifier structure.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ee1a47ab
  11. 16 11月, 2012 3 次提交
    • D
      xfs: convert buffer verifiers to an ops structure. · 1813dd64
      Dave Chinner 提交于
      To separate the verifiers from iodone functions and associate read
      and write verifiers at the same time, introduce a buffer verifier
      operations structure to the xfs_buf.
      
      This avoids the need for assigning the write verifier, clearing the
      iodone function and re-running ioend processing in the read
      verifier, and gets rid of the nasty "b_pre_io" name for the write
      verifier function pointer. If we ever need to, it will also be
      easier to add further content specific callbacks to a buffer with an
      ops structure in place.
      
      We also avoid needing to export verifier functions, instead we
      can simply export the ops structures for those that are needed
      outside the function they are defined in.
      
      This patch also fixes a directory block readahead verifier issue
      it exposed.
      
      This patch also adds ops callbacks to the inode/alloc btree blocks
      initialised by growfs. These will need more work before they will
      work with CRCs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1813dd64
    • D
      xfs: connect up write verifiers to new buffers · b0f539de
      Dave Chinner 提交于
      Metadata buffers that are read from disk have write verifiers
      already attached to them, but newly allocated buffers do not. Add
      appropriate write verifiers to all new metadata buffers.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b0f539de
    • D
      xfs: verify btree blocks as they are read from disk · 3d3e6f64
      Dave Chinner 提交于
      Add an btree block verify callback function and pass it into the
      buffer read functions. Because each different btree block type
      requires different verification, add a function to the ops structure
      that is called from the generic code.
      
      Also, propagate the verification callback functions through the
      readahead functions, and into the external bmap and bulkstat inode
      readahead code that uses the generic btree buffer read functions.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3d3e6f64
  12. 16 1月, 2010 1 次提交
  13. 17 12月, 2009 1 次提交
  14. 01 9月, 2009 1 次提交
  15. 02 7月, 2009 1 次提交
  16. 19 1月, 2009 1 次提交
  17. 16 1月, 2009 1 次提交
  18. 30 10月, 2008 15 次提交
  19. 10 4月, 2008 1 次提交
  20. 15 10月, 2007 2 次提交