1. 07 3月, 2014 1 次提交
    • D
      xfs: inode log reservations are still too small · fe4c224a
      Dave Chinner 提交于
      Back in commit 23956703 ("xfs: inode log reservations are too
      small"), the reservation size was increased to take into account the
      difference in size between the in-memory BMBT block headers and the
      on-disk BMDR headers. This solved a transaction overrun when logging
      the inode size.
      
      Recently, however, we've seen a number of these same overruns on
      kernels with the above fix in it. All of them have been by 4 bytes,
      so we must still not be accounting for something correctly.
      
      Through inspection it turns out the above commit didn't take into
      account everything it should have. That is, it only accounts for a
      single log op_hdr structure, when it can actually require up to four
      op_hdrs - one for each region (log iovec) that is formatted. These
      regions are the inode log format header, the inode core, and the two
      forks that can be held in the literal area of the inode.
      
      This means we are not accounting for 36 bytes of log space that the
      transaction can use, and hence when we get inodes in certain formats
      with particular fragmentation patterns we can overrun the
      transaction. Fix this by adding the correct accounting for log
      op_headers in the transaction.
      Tested-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fe4c224a
  2. 13 12月, 2013 2 次提交
  3. 18 11月, 2013 1 次提交
    • D
      xfs: increase inode cluster size for v5 filesystems · 8f80587b
      Dave Chinner 提交于
      v5 filesystems use 512 byte inodes as a minimum, so read inodes in
      clusters that are effectively half the size of a v4 filesystem with
      256 byte inodes. For v5 fielsystems, scale the inode cluster size
      with the size of the inode so that we keep a constant 32 inodes per
      cluster ratio for all inode IO.
      
      This only works if mkfs.xfs sets the inode alignment appropriately
      for larger inode clusters, so this functionality is made conditional
      on mkfs doing the right thing. xfs_repair needs to know about
      the inode alignment changes, too.
      
      Wall time:
      	create	bulkstat	find+stat	ls -R	unlink
      v4	237s	161s		173s		201s	299s
      v5	235s	163s		205s		 31s	356s
      patched	234s	160s		182s		 29s	317s
      
      System time:
      	create	bulkstat	find+stat	ls -R	unlink
      v4	2601s	2490s		1653s		1656s	2960s
      v5	2637s	2497s		1681s		  20s	3216s
      patched	2613s	2451s		1658s		  20s	3007s
      
      So, wall time same or down across the board, system time same or
      down across the board, and cache hit rates all improve except for
      the ls -R case which is a pure cold cache directory read workload
      on v5 filesystems...
      
      So, this patch removes most of the performance and CPU usage
      differential between v4 and v5 filesystems on traversal related
      workloads.
      
      Note: while this patch is currently for v5 filesystems only, there
      is no reason it can't be ported back to v4 filesystems.  This hasn't
      been done here because bringing the code back to v4 requires
      forwards and backwards kernel compatibility testing.  i.e. to
      deterine if older kernels(*) do the right thing with larger inode
      alignments but still only using 8k inode cluster sizes. None of this
      testing and validation on v4 filesystems has been done, so for the
      moment larger inode clusters is limited to v5 superblocks.
      
      (*) a current default config v4 filesystem should mount just fine on
      2.6.23 (when lazy-count support was introduced), and so if we change
      the alignment emitted by mkfs without a feature bit then we have to
      make sure it works properly on all kernels since 2.6.23. And if we
      allow it to be changed when the lazy-count bit is not set, then it's
      all kernels since v2 logs were introduced that need to be tested for
      compatibility...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      8f80587b
  4. 24 10月, 2013 4 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
    • D
      xfs: unify directory/attribute format definitions · 57062787
      Dave Chinner 提交于
      The on-disk format definitions for the directory and attribute
      structures are spread across 3 header files right now, only one of
      which is dedicated to defining on-disk structures and their
      manipulation (xfs_dir2_format.h). Pull all the format definitions
      into a single header file - xfs_da_format.h - and switch all the
      code over to point at that.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      57062787
    • D
      xfs: create a shared header file for format-related information · 70a9883c
      Dave Chinner 提交于
      All of the buffer operations structures are needed to be exported
      for xfs_db, so move them all to a common location rather than
      spreading them all over the place. They are verifying the on-disk
      format, so while xfs_format.h might be a good place, it is not part
      of the on disk format.
      
      Hence we need to create a new header file that we centralise these
      related definitions. Start by moving the bffer operations
      structures, and then also move all the other definitions that have
      crept into xfs_log_format.h and xfs_format.h as there was no other
      shared header file to put them in.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      70a9883c
  5. 31 8月, 2013 1 次提交
    • D
      xfs: inode log reservations are too small · 23956703
      Dave Chinner 提交于
      We've been seeing occasional problems with log space leaks and
      transaction underruns such as this for some time:
      
       XFS (dm-0): xlog_write: reservation summary:
         trans type  = FSYNC_TS (36)
         unit res    = 2740 bytes
         current res = -4 bytes
         total reg   = 0 bytes (o/flow = 0 bytes)
         ophdrs      = 0 (ophdr space = 0 bytes)
         ophdr + reg = 0 bytes
         num regions = 0
      
      Turns out that xfstests generic/311 is reliably reproducing this
      problem with the test it runs at sequence 16 of it execution. It is
      a 100% reliable reproducer with the mkfs configuration of "-b
      size=1024 -m crc=1" on a 10GB scratch device.
      
      The problem? Inode forks in btree format are logged in memory
      format, not disk format (i.e. bmbt format, not bmdr format). That
      means there is a btree block header being logged, when such a
      structure is never written to the inode fork in bmdr format. The
      bmdr header in the inode is only 4 bytes, while the bmbt header is
      24 bytes for v4 filesystems and 72 bytes for v5 filesystems.
      
      We currently reserve the inode size plus the rounded up overhead of
      a logging a buffer, which is 128 bytes. That means the reservation
      for a 512 byte inode is 640 bytes. What we can actually log is:
      
      	inode core, data and attr fork = 512 bytes
      	inode log format + log op header = 56 + 12 = 68 bytes
      	data fork bmbt hdr = 24/72 bytes
      	attr fork bmbt hdr = 24/72 bytes
      
      So, for a v2 inodes we can log at least 628 bytes, but if we split that
      inode over the end of the log across log buffers, we need to also
      another log op header, which takes us to 640 bytes. If there's
      another reservation taken out of this that I haven't taken into
      account (perhaps multiple iclog splits?) or I haven't corectly
      calculated the bmbt format space used (entirely possible), then
      we will overun it.
      
      For v3 inodes the maximum is actually 724 bytes, and even a
      single maximally sized btree format fork can blow it (652 bytes).
      And that's exactly what is happening with the FSYNC_TS transaction
      in the above output - it's consumed 644 bytes of space after the CIL
      context took the space reserved for it (2100 bytes).
      
      This problem has always been present in the XFS code - the btree
      format inode forks have always been logged in this manner. Hence
      there has always been the possibility of an overrun with such a
      transaction. The CRC code has just exposed it frequently enough to
      be able to debug and understand the root cause....
      
      So, let's fix all the inode log space reservations.
      
      [ I'm so glad we spent the effort to clean up the transaction
        reservation code. This is an easy fix now. ]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      23956703
  6. 13 8月, 2013 5 次提交