1. 24 10月, 2013 2 次提交
  2. 31 8月, 2013 1 次提交
    • D
      xfs: inode log reservations are too small · 23956703
      Dave Chinner 提交于
      We've been seeing occasional problems with log space leaks and
      transaction underruns such as this for some time:
      
       XFS (dm-0): xlog_write: reservation summary:
         trans type  = FSYNC_TS (36)
         unit res    = 2740 bytes
         current res = -4 bytes
         total reg   = 0 bytes (o/flow = 0 bytes)
         ophdrs      = 0 (ophdr space = 0 bytes)
         ophdr + reg = 0 bytes
         num regions = 0
      
      Turns out that xfstests generic/311 is reliably reproducing this
      problem with the test it runs at sequence 16 of it execution. It is
      a 100% reliable reproducer with the mkfs configuration of "-b
      size=1024 -m crc=1" on a 10GB scratch device.
      
      The problem? Inode forks in btree format are logged in memory
      format, not disk format (i.e. bmbt format, not bmdr format). That
      means there is a btree block header being logged, when such a
      structure is never written to the inode fork in bmdr format. The
      bmdr header in the inode is only 4 bytes, while the bmbt header is
      24 bytes for v4 filesystems and 72 bytes for v5 filesystems.
      
      We currently reserve the inode size plus the rounded up overhead of
      a logging a buffer, which is 128 bytes. That means the reservation
      for a 512 byte inode is 640 bytes. What we can actually log is:
      
      	inode core, data and attr fork = 512 bytes
      	inode log format + log op header = 56 + 12 = 68 bytes
      	data fork bmbt hdr = 24/72 bytes
      	attr fork bmbt hdr = 24/72 bytes
      
      So, for a v2 inodes we can log at least 628 bytes, but if we split that
      inode over the end of the log across log buffers, we need to also
      another log op header, which takes us to 640 bytes. If there's
      another reservation taken out of this that I haven't taken into
      account (perhaps multiple iclog splits?) or I haven't corectly
      calculated the bmbt format space used (entirely possible), then
      we will overun it.
      
      For v3 inodes the maximum is actually 724 bytes, and even a
      single maximally sized btree format fork can blow it (652 bytes).
      And that's exactly what is happening with the FSYNC_TS transaction
      in the above output - it's consumed 644 bytes of space after the CIL
      context took the space reserved for it (2100 bytes).
      
      This problem has always been present in the XFS code - the btree
      format inode forks have always been logged in this manner. Hence
      there has always been the possibility of an overrun with such a
      transaction. The CRC code has just exposed it frequently enough to
      be able to debug and understand the root cause....
      
      So, let's fix all the inode log space reservations.
      
      [ I'm so glad we spent the effort to clean up the transaction
        reservation code. This is an easy fix now. ]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      23956703
  3. 13 8月, 2013 5 次提交