1. 19 2月, 2014 1 次提交
    • E
      xfs: skip verification on initial "guess" superblock read · daba5427
      Eric Sandeen 提交于
      When xfs_readsb() does the very first read of the superblock,
      it makes a guess at the length of the buffer, based on the
      sector size of the underlying storage.  This may or may
      not match the filesystem sector size in sb_sectsize, so
      we can't i.e. do a CRC check on it; it might be too short.
      
      In fact, mounting a filesystem with sb_sectsize larger
      than the device sector size will cause a mount failure
      if CRCs are enabled, because we are checksumming a length
      which exceeds the buffer passed to it.
      
      So always read twice; the first time we read with NULL
      buffer ops to skip verification; then set the proper
      read length, hook up the proper verifier, and give it
      another go.
      
      Once we are sure that we've got the right buffer length,
      we can also use bp->b_length in the xfs_sb_read_verify,
      rather than the less-trusted on-disk sectorsize for
      secondary superblocks.  Before this we ran the risk of
      passing junk to the crc32c routines, which didn't always
      handle extreme values.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      daba5427
  2. 18 11月, 2013 1 次提交
    • D
      xfs: increase inode cluster size for v5 filesystems · 8f80587b
      Dave Chinner 提交于
      v5 filesystems use 512 byte inodes as a minimum, so read inodes in
      clusters that are effectively half the size of a v4 filesystem with
      256 byte inodes. For v5 fielsystems, scale the inode cluster size
      with the size of the inode so that we keep a constant 32 inodes per
      cluster ratio for all inode IO.
      
      This only works if mkfs.xfs sets the inode alignment appropriately
      for larger inode clusters, so this functionality is made conditional
      on mkfs doing the right thing. xfs_repair needs to know about
      the inode alignment changes, too.
      
      Wall time:
      	create	bulkstat	find+stat	ls -R	unlink
      v4	237s	161s		173s		201s	299s
      v5	235s	163s		205s		 31s	356s
      patched	234s	160s		182s		 29s	317s
      
      System time:
      	create	bulkstat	find+stat	ls -R	unlink
      v4	2601s	2490s		1653s		1656s	2960s
      v5	2637s	2497s		1681s		  20s	3216s
      patched	2613s	2451s		1658s		  20s	3007s
      
      So, wall time same or down across the board, system time same or
      down across the board, and cache hit rates all improve except for
      the ls -R case which is a pure cold cache directory read workload
      on v5 filesystems...
      
      So, this patch removes most of the performance and CPU usage
      differential between v4 and v5 filesystems on traversal related
      workloads.
      
      Note: while this patch is currently for v5 filesystems only, there
      is no reason it can't be ported back to v4 filesystems.  This hasn't
      been done here because bringing the code back to v4 requires
      forwards and backwards kernel compatibility testing.  i.e. to
      deterine if older kernels(*) do the right thing with larger inode
      alignments but still only using 8k inode cluster sizes. None of this
      testing and validation on v4 filesystems has been done, so for the
      moment larger inode clusters is limited to v5 superblocks.
      
      (*) a current default config v4 filesystem should mount just fine on
      2.6.23 (when lazy-count support was introduced), and so if we change
      the alignment emitted by mkfs without a feature bit then we have to
      make sure it works properly on all kernels since 2.6.23. And if we
      allow it to be changed when the lazy-count bit is not set, then it's
      all kernels since v2 logs were introduced that need to be tested for
      compatibility...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      8f80587b
  3. 24 10月, 2013 4 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
    • D
      xfs: unify directory/attribute format definitions · 57062787
      Dave Chinner 提交于
      The on-disk format definitions for the directory and attribute
      structures are spread across 3 header files right now, only one of
      which is dedicated to defining on-disk structures and their
      manipulation (xfs_dir2_format.h). Pull all the format definitions
      into a single header file - xfs_da_format.h - and switch all the
      code over to point at that.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      57062787
    • D
      xfs: create a shared header file for format-related information · 70a9883c
      Dave Chinner 提交于
      All of the buffer operations structures are needed to be exported
      for xfs_db, so move them all to a common location rather than
      spreading them all over the place. They are verifying the on-disk
      format, so while xfs_format.h might be a good place, it is not part
      of the on disk format.
      
      Hence we need to create a new header file that we centralise these
      related definitions. Start by moving the bffer operations
      structures, and then also move all the other definitions that have
      crept into xfs_log_format.h and xfs_format.h as there was no other
      shared header file to put them in.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      70a9883c
  4. 23 8月, 2013 1 次提交
  5. 21 8月, 2013 3 次提交
  6. 13 8月, 2013 5 次提交
  7. 23 7月, 2013 2 次提交
  8. 29 6月, 2013 1 次提交
  9. 20 6月, 2013 1 次提交
    • J
      xfs: Remove XFS_MOUNT_RETERR · 39a45d84
      Jie Liu 提交于
      XFS_MOUNT_RETERR is going to be set at xfs_parseargs() if
      mp->m_dalign is enabled, so any time we enter "if (mp->m_dalign)"
      branch in xfs_update_alignment(), XFS_MOUNT_RETERR is set and so
      we always be emitting a warning and returning an error.
      
      Hence, we can remove it and get rid of a couple of redundant
      check up against it at xfs_upate_alignment().
      
      Thanks Dave Chinner for the suggestions of simplify the code
      in xfs_parseargs().
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Mark Tinguely <tinguely@sgi.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      39a45d84
  10. 18 6月, 2013 1 次提交
    • J
      xfs: Don't keep silent if sunit/swidth can not be changed via mount · 34d7f603
      Jie Liu 提交于
      As per the mount man page, sunit and swidth can be changed via
      mount options.  For XFS, on the face of it, those options seems
      works if the specified alignments is properly, e.g.
      # mount -o sunit=4096,swidth=8192 /dev/sdb1 /mnt
      # mount | grep sdb1
      /dev/sdb1 on /mnt type xfs (rw,sunit=4096,swidth=8192)
      
      However, neither sunit nor swidth is shown from the xfs_info output.
      # xfs_info /mnt
      meta-data=/dev/sdb1    isize=256    agcount=4, agsize=262144 blks
               =             sectsz=512   attr=2
      data     =             bsize=4096   blocks=1048576, imaxpct=25
               =             sunit=0      swidth=0 blks
      		       ^^^^^^^^^^^^^^^^^^^^^^^^^^
      naming   =version 2    bsize=4096   ascii-ci=0
      log      =internal     bsize=4096   blocks=2560, version=2
               =             sectsz=512   sunit=0 blks, lazy-count=1
      realtime =none         extsz=4096   blocks=0, rtextents=0
      
      The reason is that the alignment can only be changed if the relevant
      super block is already configured with alignments, otherwise, the
      given value is silently ignored.
      
      With this fix, the attempt to mount a storage without strip alignment
      setup on a super block will get an error with a warning in syslog to
      indicate the true cause, e.g.
      # mount -o sunit=4096,swidth=8192 /dev/sdb1 /mnt
      mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
             missing codepage or helper program, or other error
             In some cases useful info is found in syslog - try
      	dmesg | tail  or so
      .......
      XFS (sdb1): cannot change alignment: superblock does not support data
      alignment
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Cc: Mark Tinguely <tinguely@sgi.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      34d7f603
  11. 15 6月, 2013 1 次提交
    • D
      xfs: don't emit v5 superblock warnings on write · 47ad2fcb
      Dave Chinner 提交于
      We write the superblock every 30s or so which results in the
      verifier being called. Right now that results in this output
      every 30s:
      
      XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
      Use of these features in this kernel is at your own risk!
      
      And spamming the logs.
      
      We don't need to check for whether we support v5 superblocks or
      whether there are feature bits we don't support set as these are
      only relevant when we first mount the filesytem. i.e. on superblock
      read. Hence for the write verification we can just skip all the
      checks (and hence verbose output) altogether.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 34510185)
      47ad2fcb
  12. 31 5月, 2013 1 次提交
    • D
      xfs: don't emit v5 superblock warnings on write · 34510185
      Dave Chinner 提交于
      We write the superblock every 30s or so which results in the
      verifier being called. Right now that results in this output
      every 30s:
      
      XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
      Use of these features in this kernel is at your own risk!
      
      And spamming the logs.
      
      We don't need to check for whether we support v5 superblocks or
      whether there are feature bits we don't support set as these are
      only relevant when we first mount the filesytem. i.e. on superblock
      read. Hence for the write verification we can just skip all the
      checks (and hence verbose output) altogether.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      34510185
  13. 28 4月, 2013 2 次提交
    • D
      xfs: implement extended feature masks · e721f504
      Dave Chinner 提交于
      The version 5 superblock has extended feature masks for compatible,
      incompatible and read-only compatible feature sets. Implement the
      masking and mount-time checking for these feature masks.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      e721f504
    • D
      xfs: add CRC checks to the superblock · 04a1e6c5
      Dave Chinner 提交于
      With the addition of CRCs, there is such a wide and varied change to
      the on disk format that it makes sense to bump the superblock
      version number rather than try to use feature bits for all the new
      functionality.
      
      This commit introduces all the new superblock fields needed for all
      the new functionality: feature masks similar to ext4, separate
      project quota inodes, a LSN field for recovery and the CRC field.
      
      This commit does not bump the superblock version number, however.
      That will be done as a separate commit at the end of the series
      after all the new functionality is present so we switch it all on in
      one commit. This means that we can slowly introduce the changes
      without them being active and hence maintain bisectability of the
      tree.
      
      This patch is based on a patch originally written by myself back
      from SGI days, which was subsequently modified by Christoph Hellwig.
      There is relatively little of that patch remaining, but the history
      of the patch still should be acknowledged here.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      04a1e6c5
  14. 02 2月, 2013 3 次提交
  15. 29 1月, 2013 1 次提交
  16. 17 1月, 2013 1 次提交
  17. 16 11月, 2012 6 次提交
    • D
      xfs: convert buffer verifiers to an ops structure. · 1813dd64
      Dave Chinner 提交于
      To separate the verifiers from iodone functions and associate read
      and write verifiers at the same time, introduce a buffer verifier
      operations structure to the xfs_buf.
      
      This avoids the need for assigning the write verifier, clearing the
      iodone function and re-running ioend processing in the read
      verifier, and gets rid of the nasty "b_pre_io" name for the write
      verifier function pointer. If we ever need to, it will also be
      easier to add further content specific callbacks to a buffer with an
      ops structure in place.
      
      We also avoid needing to export verifier functions, instead we
      can simply export the ops structures for those that are needed
      outside the function they are defined in.
      
      This patch also fixes a directory block readahead verifier issue
      it exposed.
      
      This patch also adds ops callbacks to the inode/alloc btree blocks
      initialised by growfs. These will need more work before they will
      work with CRCs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1813dd64
    • D
      xfs: connect up write verifiers to new buffers · b0f539de
      Dave Chinner 提交于
      Metadata buffers that are read from disk have write verifiers
      already attached to them, but newly allocated buffers do not. Add
      appropriate write verifiers to all new metadata buffers.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b0f539de
    • D
      xfs: add pre-write metadata buffer verifier callbacks · 612cfbfe
      Dave Chinner 提交于
      These verifiers are essentially the same code as the read verifiers,
      but do not require ioend processing. Hence factor the read verifier
      functions and add a new write verifier wrapper that is used as the
      callback.
      
      This is done as one large patch for all verifiers rather than one
      patch per verifier as the change is largely mechanical. This
      includes hooking up the write verifier via the read verifier
      function.
      
      Hooking up the write verifier for buffers obtained via
      xfs_trans_get_buf() will be done in a separate patch as that touches
      code in many different places rather than just the verifier
      functions.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      612cfbfe
    • D
      xfs: verify superblocks as they are read from disk · 98021821
      Dave Chinner 提交于
      Add a superblock verify callback function and pass it into the
      buffer read functions. Remove the now redundant verification code
      that is currently in use.
      
      Adding verification shows that secondary superblocks never have
      their "sb_inprogress" flag cleared by mkfs.xfs, so when validating
      the secondary superblocks during a grow operation we have to avoid
      checking this field. Even if we fix mkfs, we will still have to
      ignore this field for verification purposes unless a version of mkfs
      that does not have this bug was used.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      98021821
    • D
      xfs: uncached buffer reads need to return an error · eab4e633
      Dave Chinner 提交于
      With verification being done as an IO completion callback, different
      errors can be returned from a read. Uncached reads only return a
      buffer or NULL on failure, which means the verification error cannot
      be returned to the caller.
      
      Split the error handling for these reads into two - a failure to get
      a buffer will still return NULL, but a read error will return a
      referenced buffer with b_error set rather than NULL. The caller is
      responsible for checking the error state of the buffer returned.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      eab4e633
    • D
      xfs: make buffer read verication an IO completion function · c3f8fc73
      Dave Chinner 提交于
      Add a verifier function callback capability to the buffer read
      interfaces.  This will be used by the callers to supply a function
      that verifies the contents of the buffer when it is read from disk.
      This patch does not provide callback functions, but simply modifies
      the interfaces to allow them to be called.
      
      The reason for adding this to the read interfaces is that it is very
      difficult to tell fom the outside is a buffer was just read from
      disk or whether we just pulled it out of cache. Supplying a callbck
      allows the buffer cache to use it's internal knowledge of the buffer
      to execute it only when the buffer is read from disk.
      
      It is intended that the verifier functions will mark the buffer with
      an EFSCORRUPTED error when verification fails. This allows the
      reading context to distinguish a verification error from an IO
      error, and potentially take further actions on the buffer (e.g.
      attempt repair) based on the error reported.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c3f8fc73
  18. 09 11月, 2012 1 次提交
  19. 18 10月, 2012 3 次提交
  20. 27 9月, 2012 1 次提交