1. 31 5月, 2013 1 次提交
  2. 22 4月, 2013 4 次提交
    • D
      xfs: add CRC checks to the AGI · 983d09ff
      Dave Chinner 提交于
      Same set of changes made to the AGF need to be made to the AGI.
      This patch has a similar history to the AGF, hence a similar
      sign-off chain.
      Signed-off-by: NDave Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dgc@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      983d09ff
    • C
      xfs: add CRC checks to the AGFL · 77c95bba
      Christoph Hellwig 提交于
      Add CRC checks, location information and a magic number to the AGFL.
      Previously the AGFL was just a block containing nothing but the
      free block pointers.  The new AGFL has a real header with the usual
      boilerplate instead, so that we can verify it's not corrupted and
      written into the right place.
      
      [dchinner@redhat.com] Added LSN field, reworked significantly to fit
      into new verifier structure and growfs structure, enabled full
      verifier functionality now there is a header to verify and we can
      guarantee an initialised AGFL.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      77c95bba
    • D
      xfs: add CRC checks to the AGF · 4e0e6040
      Dave Chinner 提交于
      The AGF already has some self identifying fields (e.g. the sequence
      number) so we only need to add the uuid to it to identify the
      filesystem it belongs to. The location is fixed based on the
      sequence number, so there's no need to add a block number, either.
      
      Hence the only additional fields are the CRC and LSN fields. These
      are unlogged, so place some space between the end of the logged
      fields and them so that future expansion of the AGF for logged
      fields can be placed adjacent to the existing logged fields and
      hence not complicate the field-derived range based logging we
      currently have.
      
      Based originally on a patch from myself, modified further by
      Christoph Hellwig and then modified again to fit into the
      verifier structure with additional fields by myself. The multiple
      signed-off-by tags indicate the age and history of this patch.
      Signed-off-by: NDave Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4e0e6040
    • C
      xfs: add support for large btree blocks · ee1a47ab
      Christoph Hellwig 提交于
      Add support for larger btree blocks that contains a CRC32C checksum,
      a filesystem uuid and block number for detecting filesystem
      consistency and out of place writes.
      
      [dchinner@redhat.com] Also include an owner field to allow reverse
      mappings to be implemented for improved repairability and a LSN
      field to so that log recovery can easily determine the last
      modification that made it to disk for each buffer.
      
      [dchinner@redhat.com] Add buffer log format flags to indicate the
      type of buffer to recovery so that we don't have to do blind magic
      number tests to determine what the buffer is.
      
      [dchinner@redhat.com] Modified to fit into the verifier structure.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ee1a47ab
  3. 02 2月, 2013 1 次提交
  4. 16 11月, 2012 5 次提交
    • D
      xfs: convert buffer verifiers to an ops structure. · 1813dd64
      Dave Chinner 提交于
      To separate the verifiers from iodone functions and associate read
      and write verifiers at the same time, introduce a buffer verifier
      operations structure to the xfs_buf.
      
      This avoids the need for assigning the write verifier, clearing the
      iodone function and re-running ioend processing in the read
      verifier, and gets rid of the nasty "b_pre_io" name for the write
      verifier function pointer. If we ever need to, it will also be
      easier to add further content specific callbacks to a buffer with an
      ops structure in place.
      
      We also avoid needing to export verifier functions, instead we
      can simply export the ops structures for those that are needed
      outside the function they are defined in.
      
      This patch also fixes a directory block readahead verifier issue
      it exposed.
      
      This patch also adds ops callbacks to the inode/alloc btree blocks
      initialised by growfs. These will need more work before they will
      work with CRCs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1813dd64
    • D
      xfs: connect up write verifiers to new buffers · b0f539de
      Dave Chinner 提交于
      Metadata buffers that are read from disk have write verifiers
      already attached to them, but newly allocated buffers do not. Add
      appropriate write verifiers to all new metadata buffers.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b0f539de
    • D
      xfs: verify superblocks as they are read from disk · 98021821
      Dave Chinner 提交于
      Add a superblock verify callback function and pass it into the
      buffer read functions. Remove the now redundant verification code
      that is currently in use.
      
      Adding verification shows that secondary superblocks never have
      their "sb_inprogress" flag cleared by mkfs.xfs, so when validating
      the secondary superblocks during a grow operation we have to avoid
      checking this field. Even if we fix mkfs, we will still have to
      ignore this field for verification purposes unless a version of mkfs
      that does not have this bug was used.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      98021821
    • D
      xfs: uncached buffer reads need to return an error · eab4e633
      Dave Chinner 提交于
      With verification being done as an IO completion callback, different
      errors can be returned from a read. Uncached reads only return a
      buffer or NULL on failure, which means the verification error cannot
      be returned to the caller.
      
      Split the error handling for these reads into two - a failure to get
      a buffer will still return NULL, but a read error will return a
      referenced buffer with b_error set rather than NULL. The caller is
      responsible for checking the error state of the buffer returned.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      eab4e633
    • D
      xfs: make buffer read verication an IO completion function · c3f8fc73
      Dave Chinner 提交于
      Add a verifier function callback capability to the buffer read
      interfaces.  This will be used by the callers to supply a function
      that verifies the contents of the buffer when it is read from disk.
      This patch does not provide callback functions, but simply modifies
      the interfaces to allow them to be called.
      
      The reason for adding this to the read interfaces is that it is very
      difficult to tell fom the outside is a buffer was just read from
      disk or whether we just pulled it out of cache. Supplying a callbck
      allows the buffer cache to use it's internal knowledge of the buffer
      to execute it only when the buffer is read from disk.
      
      It is intended that the verifier functions will mark the buffer with
      an EFSCORRUPTED error when verification fails. This allows the
      reading context to distinguish a verification error from an IO
      error, and potentially take further actions on the buffer (e.g.
      attempt repair) based on the error reported.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c3f8fc73
  5. 14 11月, 2012 3 次提交
    • D
      xfs: make growfs initialise the AGFL header · de497688
      Dave Chinner 提交于
      For verification purposes, AGFLs need to be initialised to a known
      set of values. For upcoming CRC changes, they are also headers that
      need to be initialised. Currently, growfs does neither for the AGFLs
      - it ignores them completely. Add initialisation of the AGFL to be
      full of invalid block numbers (NULLAGBLOCK) to put the
      infrastructure in place needed for CRC support.
      
      Includes a comment clarification from Jeff Liu.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by Rich Johnston <rjohnston@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      de497688
    • D
      xfs: growfs: use uncached buffers for new headers · fd23683c
      Dave Chinner 提交于
      When writing the new AG headers to disk, we can't attach write
      verifiers because they have a dependency on the struct xfs-perag
      being attached to the buffer to be fully initialised and growfs
      can't fully initialise them until later in the process.
      
      The simplest way to avoid this problem is to use uncached buffers
      for writing the new headers. These buffers don't have the xfs-perag
      attached to them, so it's simple to detect in the write verifier and
      be able to skip the checks that need the xfs-perag.
      
      This enables us to attach the appropriate buffer ops to the buffer
      and hence calculate CRCs on the way to disk. IT also means that the
      buffer is torn down immediately, and so the first access to the AG
      headers will re-read the header from disk and perform full
      verification of the buffer. This way we also can catch corruptions
      due to problems that went undetected in growfs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by Rich Johnston <rjohnston@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      fd23683c
    • D
      xfs: use btree block initialisation functions in growfs · b64f3a39
      Dave Chinner 提交于
      Factor xfs_btree_init_block() to be independent of the btree cursor,
      and use the function to initialise btree blocks in the growfs code.
      This makes adding support for different format btree blocks simple.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by Rich Johnston <rjohnston@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b64f3a39
  6. 09 11月, 2012 1 次提交
  7. 08 11月, 2012 1 次提交
  8. 03 11月, 2012 1 次提交
  9. 15 5月, 2012 8 次提交
  10. 12 10月, 2011 2 次提交
  11. 11 7月, 2011 1 次提交
  12. 08 7月, 2011 1 次提交
  13. 07 3月, 2011 1 次提交
  14. 23 2月, 2011 1 次提交
  15. 22 2月, 2011 1 次提交
  16. 12 1月, 2011 1 次提交
    • D
      xfs: ensure log covering transactions are synchronous · c58efdb4
      Dave Chinner 提交于
      To ensure the log is covered and the filesystem idles correctly, we
      need to ensure that dummy transactions hit the disk and do not stay
      pinned in memory.  If the superblock is pinned in memory, it can't
      be flushed so the log covering cannot make progress. The result is
      dependent on timing - more oftent han not we continue to issues a
      log covering transaction every 36s rather than idling after ~90s.
      
      Fix this by making the log covering transaction synchronous. To
      avoid additional log force from xfssyncd, make the log covering
      transaction take the place of the existing log force in the xfssyncd
      background sync process.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c58efdb4
  17. 04 1月, 2011 1 次提交
    • D
      xfs: dynamic speculative EOF preallocation · 055388a3
      Dave Chinner 提交于
      Currently the size of the speculative preallocation during delayed
      allocation is fixed by either the allocsize mount option of a
      default size. We are seeing a lot of cases where we need to
      recommend using the allocsize mount option to prevent fragmentation
      when buffered writes land in the same AG.
      
      Rather than using a fixed preallocation size by default (up to 64k),
      make it dynamic by basing it on the current inode size. That way the
      EOF preallocation will increase as the file size increases.  Hence
      for streaming writes we are much more likely to get large
      preallocations exactly when we need it to reduce fragementation.
      
      For default settings, the size of the initial extents is determined
      by the number of parallel writers and the amount of memory in the
      machine. For 4GB RAM and 4 concurrent 32GB file writes:
      
      EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET                 TOTAL
         0: [0..1048575]:         1048672..2097247      0 (1048672..2097247)      1048576
         1: [1048576..2097151]:   5242976..6291551      0 (5242976..6291551)      1048576
         2: [2097152..4194303]:   12583008..14680159    0 (12583008..14680159)    2097152
         3: [4194304..8388607]:   25165920..29360223    0 (25165920..29360223)    4194304
         4: [8388608..16777215]:  58720352..67108959    0 (58720352..67108959)    8388608
         5: [16777216..33554423]: 117440584..134217791  0 (117440584..134217791) 16777208
         6: [33554424..50331511]: 184549056..201326143  0 (184549056..201326143) 16777088
         7: [50331512..67108599]: 251657408..268434495  0 (251657408..268434495) 16777088
      
      and for 16 concurrent 16GB file writes:
      
       EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET                 TOTAL
         0: [0..262143]:          2490472..2752615      0 (2490472..2752615)       262144
         1: [262144..524287]:     6291560..6553703      0 (6291560..6553703)       262144
         2: [524288..1048575]:    13631592..14155879    0 (13631592..14155879)     524288
         3: [1048576..2097151]:   30408808..31457383    0 (30408808..31457383)    1048576
         4: [2097152..4194303]:   52428904..54526055    0 (52428904..54526055)    2097152
         5: [4194304..8388607]:   104857704..109052007  0 (104857704..109052007)  4194304
         6: [8388608..16777215]:  209715304..218103911  0 (209715304..218103911)  8388608
         7: [16777216..33554423]: 452984848..469762055  0 (452984848..469762055) 16777208
      
      Because it is hard to take back specualtive preallocation, cases
      where there are large slow growing log files on a nearly full
      filesystem may cause premature ENOSPC. Hence as the filesystem nears
      full, the maximum dynamic prealloc size іs reduced according to this
      table (based on 4k block size):
      
      freespace       max prealloc size
        >5%             full extent (8GB)
        4-5%             2GB (8GB >> 2)
        3-4%             1GB (8GB >> 3)
        2-3%           512MB (8GB >> 4)
        1-2%           256MB (8GB >> 5)
        <1%            128MB (8GB >> 6)
      
      This should reduce the amount of space held in speculative
      preallocation for such cases.
      
      The allocsize mount option turns off the dynamic behaviour and fixes
      the prealloc size to whatever the mount option specifies. i.e. the
      behaviour is unchanged.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      055388a3
  18. 19 10月, 2010 2 次提交
  19. 24 8月, 2010 1 次提交
    • D
      xfs: dummy transactions should not dirty VFS state · 1a387d3b
      Dave Chinner 提交于
      When we  need to cover the log, we issue dummy transactions to ensure
      the current log tail is on disk. Unfortunately we currently use the
      root inode in the dummy transaction, and the act of committing the
      transaction dirties the inode at the VFS level.
      
      As a result, the VFS writeback of the dirty inode will prevent the
      filesystem from idling long enough for the log covering state
      machine to complete. The state machine gets stuck in a loop issuing
      new dummy transactions to cover the log and never makes progress.
      
      To avoid this problem, the dummy transactions should not cause
      externally visible state changes. To ensure this occurs, make sure
      that dummy transactions log an unchanging field in the superblock as
      it's state is never propagated outside the filesystem. This allows
      the log covering state machine to complete successfully and the
      filesystem now correctly enters a fully idle state about 90s after
      the last modification was made.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      1a387d3b
  20. 27 7月, 2010 3 次提交