1. 28 11月, 2014 2 次提交
  2. 30 7月, 2014 1 次提交
  3. 25 6月, 2014 2 次提交
  4. 22 6月, 2014 1 次提交
  5. 06 6月, 2014 1 次提交
  6. 24 4月, 2014 1 次提交
  7. 14 4月, 2014 2 次提交
  8. 27 2月, 2014 2 次提交
  9. 24 10月, 2013 3 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
    • D
      xfs: create a shared header file for format-related information · 70a9883c
      Dave Chinner 提交于
      All of the buffer operations structures are needed to be exported
      for xfs_db, so move them all to a common location rather than
      spreading them all over the place. They are verifying the on-disk
      format, so while xfs_format.h might be a good place, it is not part
      of the on disk format.
      
      Hence we need to create a new header file that we centralise these
      related definitions. Start by moving the bffer operations
      structures, and then also move all the other definitions that have
      crept into xfs_log_format.h and xfs_format.h as there was no other
      shared header file to put them in.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      70a9883c
  10. 11 9月, 2013 1 次提交
    • D
      xfs: recovery of swap extents operations for CRC filesystems · 638f4416
      Dave Chinner 提交于
      This is the recovery side of the btree block owner change operation
      performed by swapext on CRC enabled filesystems. We detect that an
      owner change is needed by the flag that has been placed on the inode
      log format flag field. Because the inode recovery is being replayed
      after the buffers that make up the BMBT in the given checkpoint, we
      can walk all the buffers and directly modify them when we see the
      flag set on an inode.
      
      Because the inode can be relogged and hence present in multiple
      chekpoints with the "change owner" flag set, we could do multiple
      passes across the inode to do this change. While this isn't optimal,
      we can't directly ignore the flag as there may be multiple
      independent swap extent operations being replayed on the same inode
      in different checkpoints so we can't ignore them.
      
      Further, because the owner change operation uses ordered buffers, we
      might have buffers that are newer on disk than the current
      checkpoint and so already have the owner changed in them. Hence we
      cannot just peek at a buffer in the tree and check that it has the
      correct owner and assume that the change was completed.
      
      So, for the moment just brute force the owner change every time we
      see an inode with the flag set. Note that we have to be careful here
      because the owner of the buffers may point to either the old owner
      or the new owner. Currently the verifier can't verify the owner
      directly, so there is no failure case here right now. If we verify
      the owner exactly in future, then we'll have to take this into
      account.
      
      This was tested in terms of normal operation via xfstests - all of
      the fsr tests now pass without failure. however, we really need to
      modify xfs/227 to stress v3 inodes correctly to ensure we fully
      cover this case for v5 filesystems.
      
      In terms of recovery testing, I used a hacked version of xfs_fsr
      that held the temp inode open for a few seconds before exiting so
      that the filesystem could be shut down with an open owner change
      recovery flags set on at least the temp inode. fsr leaves the temp
      inode unlinked and in btree format, so this was necessary for the
      owner change to be reliably replayed.
      
      logprint confirmed the tmp inode in the log had the correct flag set:
      
      INO: cnt:3 total:3 a:0x69e9e0 len:56 a:0x69ea20 len:176 a:0x69eae0 len:88
              INODE: #regs:3   ino:0x44  flags:0x209   dsize:88
      	                                 ^^^^^
      
      0x200 is set, indicating a data fork owner change needed to be
      replayed on inode 0x44.  A printk in the revoery code confirmed that
      the inode change was recovered:
      
      XFS (vdc): Mounting Filesystem
      XFS (vdc): Starting recovery (logdev: internal)
      recovering owner change ino 0x44
      XFS (vdc): Version 5 superblock detected. This kernel L support enabled!
      Use of these features in this kernel is at your own risk!
      XFS (vdc): Ending recovery (logdev: internal)
      
      The script used to test this was:
      
      $ cat ./recovery-fsr.sh
      #!/bin/bash
      
      dev=/dev/vdc
      mntpt=/mnt/scratch
      testfile=$mntpt/testfile
      
      umount $mntpt
      mkfs.xfs -f -m crc=1 $dev
      mount $dev $mntpt
      chmod 777 $mntpt
      
      for i in `seq 10000 -1 0`; do
              xfs_io -f -d -c "pwrite $(($i * 4096)) 4096" $testfile > /dev/null 2>&1
      done
      xfs_bmap -vp $testfile |head -20
      
      xfs_fsr -d -v $testfile &
      sleep 10
      /home/dave/src/xfstests-dev/src/godown -f $mntpt
      wait
      umount $mntpt
      
      xfs_logprint -t $dev |tail -20
      time mount $dev $mntpt
      xfs_bmap -vp $testfile
      umount $mntpt
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      638f4416
  11. 10 9月, 2013 1 次提交
    • D
      xfs: swap extents operations for CRC filesystems · 21b5c978
      Dave Chinner 提交于
      For CRC enabled filesystems, we can't just swap inode forks from one
      inode to another when defragmenting a file - the blocks in the inode
      fork bmap btree contain pointers back to the owner inode. Hence if
      we are to swap the inode forks we have to atomically modify every
      block in the btree during the transaction.
      
      We are doing an entire fork swap here, so we could create a new
      transaction item type that indicates we are changing the owner of a
      certain structure from one value to another. If we combine this with
      ordered buffer logging to modify all the buffers in the tree, then
      we can change the buffers in the tree without needing log space for
      the operation. However, this then requires log recovery to perform
      the modification of the owner information of the objects/structures
      in question.
      
      This does introduce some interesting ordering details into recovery:
      we have to make sure that the owner change replay occurs after the
      change that moves the objects is made, not before. Hence we can't
      use a separate log item for this as we have no guarantee of strict
      ordering between multiple items in the log due to the relogging
      action of asynchronous transaction commits. Hence there is no
      "generic" method we can use for changing the ownership of arbitrary
      metadata structures.
      
      For inode forks, however, there is a simple method of communicating
      that the fork contents need the owner rewritten - we can pass a
      inode log format flag for the fork for the transaction that does a
      fork swap. This flag will then follow the inode fork through
      relogging actions so when the swap actually gets replayed the
      ownership can be changed immediately by log recovery.  So that gives
      us a simple method of "whole fork" exchange between two inodes.
      
      This is relatively simple to implement, so it makes sense to do this
      as an initial implementation to support xfs_fsr on CRC enabled
      filesytems in the same manner as we do on existing filesystems. This
      commit introduces the swapext driven functionality, the recovery
      functionality will be in a separate patch.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      21b5c978
  12. 31 8月, 2013 1 次提交
    • D
      xfs: btree block LSN escaping to disk uninitialised · b58fa554
      Dave Chinner 提交于
      When testing LSN ordering code for v5 superblocks, it was discovered
      that the the LSN embedded in the generic btree blocks was
      occasionally uninitialised. These values didn't get written to disk
      by metadata writeback - they got written by previous transactions in
      log recovery.
      
      The issue is here that the when the block is first allocated and
      initialised, the LSN field was not initialised - it gets overwritten
      before IO is issued on the buffer - but the value that is logged by
      transactions that modify the header before it is written to disk
      (and initialised) contain garbage. Hence the first recovery of the
      buffer will stamp garbage into the LSN field, and that can cause
      subsequent transactions to not replay correctly.
      
      The fix is simply to initialise the bb_lsn field to zero when we
      initialise the block for the first time.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b58fa554
  13. 21 8月, 2013 3 次提交
  14. 15 6月, 2013 1 次提交
    • D
      xfs: ensure btree root split sets blkno correctly · 088c9f67
      Dave Chinner 提交于
      For CRC enabled filesystems, the BMBT is rooted in an inode, so it
      passes through a different code path on root splits than the
      freespace and inode btrees. This is much less traversed by xfstests
      than the other trees. When testing on a 1k block size filesystem,
      I've been seeing ASSERT failures in generic/234 like:
      
      XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_private.b.allocated == 0, file: fs/xfs/xfs_btree.c, line: 317
      
      which are generally preceded by a lblock check failure. I noticed
      this in the bmbt stats:
      
      $ pminfo -f xfs.btree.block_map
      
      xfs.btree.block_map.lookup
          value 39135
      
      xfs.btree.block_map.compare
          value 268432
      
      xfs.btree.block_map.insrec
          value 15786
      
      xfs.btree.block_map.delrec
          value 13884
      
      xfs.btree.block_map.newroot
          value 2
      
      xfs.btree.block_map.killroot
          value 0
      .....
      
      Very little coverage of root splits and merges. Indeed, on a 4k
      filesystem, block_map.newroot and block_map.killroot are both zero.
      i.e. the code is not exercised at all, and it's the only generic
      btree infrastructure operation that is not exercised by a default run
      of xfstests.
      
      Turns out that on a 1k filesystem, generic/234 accounts for one of
      those two root splits, and that is somewhat of a smoking gun. In
      fact, it's the same problem we saw in the directory/attr code where
      headers are memcpy()d from one block to another without updating the
      self describing metadata.
      
      Simple fix - when copying the header out of the root block, make
      sure the block number is updated correctly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit ade1335a)
      088c9f67
  15. 14 6月, 2013 1 次提交
    • D
      xfs: ensure btree root split sets blkno correctly · ade1335a
      Dave Chinner 提交于
      For CRC enabled filesystems, the BMBT is rooted in an inode, so it
      passes through a different code path on root splits than the
      freespace and inode btrees. This is much less traversed by xfstests
      than the other trees. When testing on a 1k block size filesystem,
      I've been seeing ASSERT failures in generic/234 like:
      
      XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_private.b.allocated == 0, file: fs/xfs/xfs_btree.c, line: 317
      
      which are generally preceded by a lblock check failure. I noticed
      this in the bmbt stats:
      
      $ pminfo -f xfs.btree.block_map
      
      xfs.btree.block_map.lookup
          value 39135
      
      xfs.btree.block_map.compare
          value 268432
      
      xfs.btree.block_map.insrec
          value 15786
      
      xfs.btree.block_map.delrec
          value 13884
      
      xfs.btree.block_map.newroot
          value 2
      
      xfs.btree.block_map.killroot
          value 0
      .....
      
      Very little coverage of root splits and merges. Indeed, on a 4k
      filesystem, block_map.newroot and block_map.killroot are both zero.
      i.e. the code is not exercised at all, and it's the only generic
      btree infrastructure operation that is not exercised by a default run
      of xfstests.
      
      Turns out that on a 1k filesystem, generic/234 accounts for one of
      those two root splits, and that is somewhat of a smoking gun. In
      fact, it's the same problem we saw in the directory/attr code where
      headers are memcpy()d from one block to another without updating the
      self describing metadata.
      
      Simple fix - when copying the header out of the root block, make
      sure the block number is updated correctly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ade1335a
  16. 28 4月, 2013 1 次提交
    • D
      xfs: buffer type overruns blf_flags field · 61fe135c
      Dave Chinner 提交于
      The buffer type passed to log recvoery in the buffer log item
      overruns the blf_flags field. I had assumed that flags field was a
      32 bit value, and it turns out it is a unisgned short. Therefore
      having 19 flags doesn't really work.
      
      Convert the buffer type field to numeric value, and use the top 5
      bits of the flags field for it. We currently have 17 types of
      buffers, so using 5 bits gives us plenty of room for expansion in
      future....
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      61fe135c
  17. 22 4月, 2013 1 次提交
    • C
      xfs: add support for large btree blocks · ee1a47ab
      Christoph Hellwig 提交于
      Add support for larger btree blocks that contains a CRC32C checksum,
      a filesystem uuid and block number for detecting filesystem
      consistency and out of place writes.
      
      [dchinner@redhat.com] Also include an owner field to allow reverse
      mappings to be implemented for improved repairability and a LSN
      field to so that log recovery can easily determine the last
      modification that made it to disk for each buffer.
      
      [dchinner@redhat.com] Add buffer log format flags to indicate the
      type of buffer to recovery so that we don't have to do blind magic
      number tests to determine what the buffer is.
      
      [dchinner@redhat.com] Modified to fit into the verifier structure.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ee1a47ab
  18. 16 11月, 2012 4 次提交
    • D
      xfs: convert buffer verifiers to an ops structure. · 1813dd64
      Dave Chinner 提交于
      To separate the verifiers from iodone functions and associate read
      and write verifiers at the same time, introduce a buffer verifier
      operations structure to the xfs_buf.
      
      This avoids the need for assigning the write verifier, clearing the
      iodone function and re-running ioend processing in the read
      verifier, and gets rid of the nasty "b_pre_io" name for the write
      verifier function pointer. If we ever need to, it will also be
      easier to add further content specific callbacks to a buffer with an
      ops structure in place.
      
      We also avoid needing to export verifier functions, instead we
      can simply export the ops structures for those that are needed
      outside the function they are defined in.
      
      This patch also fixes a directory block readahead verifier issue
      it exposed.
      
      This patch also adds ops callbacks to the inode/alloc btree blocks
      initialised by growfs. These will need more work before they will
      work with CRCs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1813dd64
    • D
      xfs: connect up write verifiers to new buffers · b0f539de
      Dave Chinner 提交于
      Metadata buffers that are read from disk have write verifiers
      already attached to them, but newly allocated buffers do not. Add
      appropriate write verifiers to all new metadata buffers.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b0f539de
    • D
      xfs: verify btree blocks as they are read from disk · 3d3e6f64
      Dave Chinner 提交于
      Add an btree block verify callback function and pass it into the
      buffer read functions. Because each different btree block type
      requires different verification, add a function to the ops structure
      that is called from the generic code.
      
      Also, propagate the verification callback functions through the
      readahead functions, and into the external bmap and bulkstat inode
      readahead code that uses the generic btree buffer read functions.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3d3e6f64
    • D
      xfs: make buffer read verication an IO completion function · c3f8fc73
      Dave Chinner 提交于
      Add a verifier function callback capability to the buffer read
      interfaces.  This will be used by the callers to supply a function
      that verifies the contents of the buffer when it is read from disk.
      This patch does not provide callback functions, but simply modifies
      the interfaces to allow them to be called.
      
      The reason for adding this to the read interfaces is that it is very
      difficult to tell fom the outside is a buffer was just read from
      disk or whether we just pulled it out of cache. Supplying a callbck
      allows the buffer cache to use it's internal knowledge of the buffer
      to execute it only when the buffer is read from disk.
      
      It is intended that the verifier functions will mark the buffer with
      an EFSCORRUPTED error when verification fails. This allows the
      reading context to distinguish a verification error from an IO
      error, and potentially take further actions on the buffer (e.g.
      attempt repair) based on the error reported.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c3f8fc73
  19. 14 11月, 2012 1 次提交
  20. 15 5月, 2012 1 次提交
  21. 12 10月, 2011 2 次提交
  22. 26 7月, 2011 1 次提交
  23. 13 7月, 2011 1 次提交
  24. 08 7月, 2011 1 次提交
  25. 02 12月, 2010 1 次提交
  26. 19 10月, 2010 2 次提交
    • C
      xfs: remove xfs_buf wrappers · 1a1a3e97
      Christoph Hellwig 提交于
      Stop having two different names for many buffer functions and use
      the more descriptive xfs_buf_* names directly.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      1a1a3e97
    • C
      xfs: remove the ->kill_root btree operation · c0e59e1a
      Christoph Hellwig 提交于
      The implementation os ->kill_root only differ by either simply
      zeroing out the now unused buffer in the btree cursor in the inode
      allocation btree or using xfs_btree_setbuf in the allocation btree.
      
      Initially both of them used xfs_btree_setbuf, but the use in the
      ialloc btree was removed early on because it interacted badly with
      xfs_trans_binval.
      
      In addition to zeroing out the buffer in the cursor xfs_btree_setbuf
      updates the bc_ra array in the btree cursor, and calls
      xfs_trans_brelse on the buffer previous occupying the slot.
      
      The bc_ra update should be done for the alloc btree updated too,
      although the lack of it does not cause serious problems.  The
      xfs_trans_brelse call on the other hand is effectively a no-op in
      the end - it keeps decrementing the bli_recur refcount until it hits
      zero, and then just skips out because the buffer will always be
      dirty at this point.  So removing it for the allocation btree is
      just fine.
      
      So unify the code and move it to xfs_btree.c.  While we're at it
      also replace the call to xfs_btree_setbuf with a NULL bp argument in
      xfs_btree_del_cursor with a direct call to xfs_trans_brelse given
      that the cursor is beeing freed just after this and the state
      updates are superflous.  After this xfs_btree_setbuf is only used
      with a non-NULL bp argument and can thus be simplified.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c0e59e1a
  27. 27 7月, 2010 1 次提交