1. 22 4月, 2013 2 次提交
    • D
      xfs: add CRC checks to the AGF · 4e0e6040
      Dave Chinner 提交于
      The AGF already has some self identifying fields (e.g. the sequence
      number) so we only need to add the uuid to it to identify the
      filesystem it belongs to. The location is fixed based on the
      sequence number, so there's no need to add a block number, either.
      
      Hence the only additional fields are the CRC and LSN fields. These
      are unlogged, so place some space between the end of the logged
      fields and them so that future expansion of the AGF for logged
      fields can be placed adjacent to the existing logged fields and
      hence not complicate the field-derived range based logging we
      currently have.
      
      Based originally on a patch from myself, modified further by
      Christoph Hellwig and then modified again to fit into the
      verifier structure with additional fields by myself. The multiple
      signed-off-by tags indicate the age and history of this patch.
      Signed-off-by: NDave Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4e0e6040
    • C
      xfs: add support for large btree blocks · ee1a47ab
      Christoph Hellwig 提交于
      Add support for larger btree blocks that contains a CRC32C checksum,
      a filesystem uuid and block number for detecting filesystem
      consistency and out of place writes.
      
      [dchinner@redhat.com] Also include an owner field to allow reverse
      mappings to be implemented for improved repairability and a LSN
      field to so that log recovery can easily determine the last
      modification that made it to disk for each buffer.
      
      [dchinner@redhat.com] Add buffer log format flags to indicate the
      type of buffer to recovery so that we don't have to do blind magic
      number tests to determine what the buffer is.
      
      [dchinner@redhat.com] Modified to fit into the verifier structure.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ee1a47ab
  2. 02 2月, 2013 1 次提交
  3. 16 11月, 2012 5 次提交
    • D
      xfs: convert buffer verifiers to an ops structure. · 1813dd64
      Dave Chinner 提交于
      To separate the verifiers from iodone functions and associate read
      and write verifiers at the same time, introduce a buffer verifier
      operations structure to the xfs_buf.
      
      This avoids the need for assigning the write verifier, clearing the
      iodone function and re-running ioend processing in the read
      verifier, and gets rid of the nasty "b_pre_io" name for the write
      verifier function pointer. If we ever need to, it will also be
      easier to add further content specific callbacks to a buffer with an
      ops structure in place.
      
      We also avoid needing to export verifier functions, instead we
      can simply export the ops structures for those that are needed
      outside the function they are defined in.
      
      This patch also fixes a directory block readahead verifier issue
      it exposed.
      
      This patch also adds ops callbacks to the inode/alloc btree blocks
      initialised by growfs. These will need more work before they will
      work with CRCs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1813dd64
    • D
      xfs: connect up write verifiers to new buffers · b0f539de
      Dave Chinner 提交于
      Metadata buffers that are read from disk have write verifiers
      already attached to them, but newly allocated buffers do not. Add
      appropriate write verifiers to all new metadata buffers.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b0f539de
    • D
      xfs: verify superblocks as they are read from disk · 98021821
      Dave Chinner 提交于
      Add a superblock verify callback function and pass it into the
      buffer read functions. Remove the now redundant verification code
      that is currently in use.
      
      Adding verification shows that secondary superblocks never have
      their "sb_inprogress" flag cleared by mkfs.xfs, so when validating
      the secondary superblocks during a grow operation we have to avoid
      checking this field. Even if we fix mkfs, we will still have to
      ignore this field for verification purposes unless a version of mkfs
      that does not have this bug was used.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      98021821
    • D
      xfs: uncached buffer reads need to return an error · eab4e633
      Dave Chinner 提交于
      With verification being done as an IO completion callback, different
      errors can be returned from a read. Uncached reads only return a
      buffer or NULL on failure, which means the verification error cannot
      be returned to the caller.
      
      Split the error handling for these reads into two - a failure to get
      a buffer will still return NULL, but a read error will return a
      referenced buffer with b_error set rather than NULL. The caller is
      responsible for checking the error state of the buffer returned.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      eab4e633
    • D
      xfs: make buffer read verication an IO completion function · c3f8fc73
      Dave Chinner 提交于
      Add a verifier function callback capability to the buffer read
      interfaces.  This will be used by the callers to supply a function
      that verifies the contents of the buffer when it is read from disk.
      This patch does not provide callback functions, but simply modifies
      the interfaces to allow them to be called.
      
      The reason for adding this to the read interfaces is that it is very
      difficult to tell fom the outside is a buffer was just read from
      disk or whether we just pulled it out of cache. Supplying a callbck
      allows the buffer cache to use it's internal knowledge of the buffer
      to execute it only when the buffer is read from disk.
      
      It is intended that the verifier functions will mark the buffer with
      an EFSCORRUPTED error when verification fails. This allows the
      reading context to distinguish a verification error from an IO
      error, and potentially take further actions on the buffer (e.g.
      attempt repair) based on the error reported.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c3f8fc73
  4. 14 11月, 2012 3 次提交
    • D
      xfs: make growfs initialise the AGFL header · de497688
      Dave Chinner 提交于
      For verification purposes, AGFLs need to be initialised to a known
      set of values. For upcoming CRC changes, they are also headers that
      need to be initialised. Currently, growfs does neither for the AGFLs
      - it ignores them completely. Add initialisation of the AGFL to be
      full of invalid block numbers (NULLAGBLOCK) to put the
      infrastructure in place needed for CRC support.
      
      Includes a comment clarification from Jeff Liu.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by Rich Johnston <rjohnston@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      de497688
    • D
      xfs: growfs: use uncached buffers for new headers · fd23683c
      Dave Chinner 提交于
      When writing the new AG headers to disk, we can't attach write
      verifiers because they have a dependency on the struct xfs-perag
      being attached to the buffer to be fully initialised and growfs
      can't fully initialise them until later in the process.
      
      The simplest way to avoid this problem is to use uncached buffers
      for writing the new headers. These buffers don't have the xfs-perag
      attached to them, so it's simple to detect in the write verifier and
      be able to skip the checks that need the xfs-perag.
      
      This enables us to attach the appropriate buffer ops to the buffer
      and hence calculate CRCs on the way to disk. IT also means that the
      buffer is torn down immediately, and so the first access to the AG
      headers will re-read the header from disk and perform full
      verification of the buffer. This way we also can catch corruptions
      due to problems that went undetected in growfs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by Rich Johnston <rjohnston@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      fd23683c
    • D
      xfs: use btree block initialisation functions in growfs · b64f3a39
      Dave Chinner 提交于
      Factor xfs_btree_init_block() to be independent of the btree cursor,
      and use the function to initialise btree blocks in the growfs code.
      This makes adding support for different format btree blocks simple.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by Rich Johnston <rjohnston@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b64f3a39
  5. 09 11月, 2012 1 次提交
  6. 08 11月, 2012 1 次提交
  7. 03 11月, 2012 1 次提交
  8. 15 5月, 2012 8 次提交
  9. 12 10月, 2011 2 次提交
  10. 11 7月, 2011 1 次提交
  11. 08 7月, 2011 1 次提交
  12. 07 3月, 2011 1 次提交
  13. 23 2月, 2011 1 次提交
  14. 22 2月, 2011 1 次提交
  15. 12 1月, 2011 1 次提交
    • D
      xfs: ensure log covering transactions are synchronous · c58efdb4
      Dave Chinner 提交于
      To ensure the log is covered and the filesystem idles correctly, we
      need to ensure that dummy transactions hit the disk and do not stay
      pinned in memory.  If the superblock is pinned in memory, it can't
      be flushed so the log covering cannot make progress. The result is
      dependent on timing - more oftent han not we continue to issues a
      log covering transaction every 36s rather than idling after ~90s.
      
      Fix this by making the log covering transaction synchronous. To
      avoid additional log force from xfssyncd, make the log covering
      transaction take the place of the existing log force in the xfssyncd
      background sync process.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c58efdb4
  16. 04 1月, 2011 1 次提交
    • D
      xfs: dynamic speculative EOF preallocation · 055388a3
      Dave Chinner 提交于
      Currently the size of the speculative preallocation during delayed
      allocation is fixed by either the allocsize mount option of a
      default size. We are seeing a lot of cases where we need to
      recommend using the allocsize mount option to prevent fragmentation
      when buffered writes land in the same AG.
      
      Rather than using a fixed preallocation size by default (up to 64k),
      make it dynamic by basing it on the current inode size. That way the
      EOF preallocation will increase as the file size increases.  Hence
      for streaming writes we are much more likely to get large
      preallocations exactly when we need it to reduce fragementation.
      
      For default settings, the size of the initial extents is determined
      by the number of parallel writers and the amount of memory in the
      machine. For 4GB RAM and 4 concurrent 32GB file writes:
      
      EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET                 TOTAL
         0: [0..1048575]:         1048672..2097247      0 (1048672..2097247)      1048576
         1: [1048576..2097151]:   5242976..6291551      0 (5242976..6291551)      1048576
         2: [2097152..4194303]:   12583008..14680159    0 (12583008..14680159)    2097152
         3: [4194304..8388607]:   25165920..29360223    0 (25165920..29360223)    4194304
         4: [8388608..16777215]:  58720352..67108959    0 (58720352..67108959)    8388608
         5: [16777216..33554423]: 117440584..134217791  0 (117440584..134217791) 16777208
         6: [33554424..50331511]: 184549056..201326143  0 (184549056..201326143) 16777088
         7: [50331512..67108599]: 251657408..268434495  0 (251657408..268434495) 16777088
      
      and for 16 concurrent 16GB file writes:
      
       EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET                 TOTAL
         0: [0..262143]:          2490472..2752615      0 (2490472..2752615)       262144
         1: [262144..524287]:     6291560..6553703      0 (6291560..6553703)       262144
         2: [524288..1048575]:    13631592..14155879    0 (13631592..14155879)     524288
         3: [1048576..2097151]:   30408808..31457383    0 (30408808..31457383)    1048576
         4: [2097152..4194303]:   52428904..54526055    0 (52428904..54526055)    2097152
         5: [4194304..8388607]:   104857704..109052007  0 (104857704..109052007)  4194304
         6: [8388608..16777215]:  209715304..218103911  0 (209715304..218103911)  8388608
         7: [16777216..33554423]: 452984848..469762055  0 (452984848..469762055) 16777208
      
      Because it is hard to take back specualtive preallocation, cases
      where there are large slow growing log files on a nearly full
      filesystem may cause premature ENOSPC. Hence as the filesystem nears
      full, the maximum dynamic prealloc size іs reduced according to this
      table (based on 4k block size):
      
      freespace       max prealloc size
        >5%             full extent (8GB)
        4-5%             2GB (8GB >> 2)
        3-4%             1GB (8GB >> 3)
        2-3%           512MB (8GB >> 4)
        1-2%           256MB (8GB >> 5)
        <1%            128MB (8GB >> 6)
      
      This should reduce the amount of space held in speculative
      preallocation for such cases.
      
      The allocsize mount option turns off the dynamic behaviour and fixes
      the prealloc size to whatever the mount option specifies. i.e. the
      behaviour is unchanged.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      055388a3
  17. 19 10月, 2010 2 次提交
  18. 24 8月, 2010 1 次提交
    • D
      xfs: dummy transactions should not dirty VFS state · 1a387d3b
      Dave Chinner 提交于
      When we  need to cover the log, we issue dummy transactions to ensure
      the current log tail is on disk. Unfortunately we currently use the
      root inode in the dummy transaction, and the act of committing the
      transaction dirties the inode at the VFS level.
      
      As a result, the VFS writeback of the dirty inode will prevent the
      filesystem from idling long enough for the log covering state
      machine to complete. The state machine gets stuck in a loop issuing
      new dummy transactions to cover the log and never makes progress.
      
      To avoid this problem, the dummy transactions should not cause
      externally visible state changes. To ensure this occurs, make sure
      that dummy transactions log an unchanging field in the superblock as
      it's state is never propagated outside the filesystem. This allows
      the log covering state machine to complete successfully and the
      filesystem now correctly enters a fully idle state about 90s after
      the last modification was made.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      1a387d3b
  19. 27 7月, 2010 3 次提交
  20. 16 1月, 2010 1 次提交
    • D
      xfs: Replace per-ag array with a radix tree · 1c1c6ebc
      Dave Chinner 提交于
      The use of an array for the per-ag structures requires reallocation
      of the array when growing the filesystem. This requires locking
      access to the array to avoid use after free situations, and the
      locking is difficult to get right. To avoid needing to reallocate an
      array, change the per-ag structures to an allocated object per ag
      and index them using a tree structure.
      
      The AGs are always densely indexed (hence the use of an array), but
      the number supported is 2^32 and lookups tend to be random and hence
      indexing needs to scale. A simple choice is a radix tree - it works
      well with this sort of index.  This change also removes another
      large contiguous allocation from the mount/growfs path in XFS.
      
      The growing process now needs to change to only initialise the new
      AGs required for the extra space, and as such only needs to
      exclusively lock the tree for inserts. The rest of the code only
      needs to lock the tree while doing lookups, and hence this will
      remove all the deadlocks that currently occur on the m_perag_lock as
      it is now an innermost lock. The lock is also changed to a spinlock
      from a read/write lock as the hold time is now extremely short.
      
      To complete the picture, the per-ag structures will need to be
      reference counted to ensure that we don't free/modify them while
      they are still in use.  This will be done in subsequent patch.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      1c1c6ebc
  21. 15 12月, 2009 1 次提交
    • C
      xfs: event tracing support · 0b1b213f
      Christoph Hellwig 提交于
      Convert the old xfs tracing support that could only be used with the
      out of tree kdb and xfsidbg patches to use the generic event tracer.
      
      To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
      all xfs trace channels by:
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/enable
      
      or alternatively enable single events by just doing the same in one
      event subdirectory, e.g.
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable
      
      or set more complex filters, etc. In Documentation/trace/events.txt
      all this is desctribed in more detail.  To reads the events do a
      
         cat /sys/kernel/debug/tracing/trace
      
      Compared to the last posting this patch converts the tracing mostly to
      the one tracepoint per callsite model that other users of the new
      tracing facility also employ.  This allows a very fine-grained control
      of the tracing, a cleaner output of the traces and also enables the
      perf tool to use each tracepoint as a virtual performance counter,
           allowing us to e.g. count how often certain workloads git various
           spots in XFS.  Take a look at
      
          http://lwn.net/Articles/346470/
      
      for some examples.
      
      Also the btree tracing isn't included at all yet, as it will require
      additional core tracing features not in mainline yet, I plan to
      deliver it later.
      
      And the really nice thing about this patch is that it actually removes
      many lines of code while adding this nice functionality:
      
       fs/xfs/Makefile                |    8
       fs/xfs/linux-2.6/xfs_acl.c     |    1
       fs/xfs/linux-2.6/xfs_aops.c    |   52 -
       fs/xfs/linux-2.6/xfs_aops.h    |    2
       fs/xfs/linux-2.6/xfs_buf.c     |  117 +--
       fs/xfs/linux-2.6/xfs_buf.h     |   33
       fs/xfs/linux-2.6/xfs_fs_subr.c |    3
       fs/xfs/linux-2.6/xfs_ioctl.c   |    1
       fs/xfs/linux-2.6/xfs_ioctl32.c |    1
       fs/xfs/linux-2.6/xfs_iops.c    |    1
       fs/xfs/linux-2.6/xfs_linux.h   |    1
       fs/xfs/linux-2.6/xfs_lrw.c     |   87 --
       fs/xfs/linux-2.6/xfs_lrw.h     |   45 -
       fs/xfs/linux-2.6/xfs_super.c   |  104 ---
       fs/xfs/linux-2.6/xfs_super.h   |    7
       fs/xfs/linux-2.6/xfs_sync.c    |    1
       fs/xfs/linux-2.6/xfs_trace.c   |   75 ++
       fs/xfs/linux-2.6/xfs_trace.h   | 1369 +++++++++++++++++++++++++++++++++++++++++
       fs/xfs/linux-2.6/xfs_vnode.h   |    4
       fs/xfs/quota/xfs_dquot.c       |  110 ---
       fs/xfs/quota/xfs_dquot.h       |   21
       fs/xfs/quota/xfs_qm.c          |   40 -
       fs/xfs/quota/xfs_qm_syscalls.c |    4
       fs/xfs/support/ktrace.c        |  323 ---------
       fs/xfs/support/ktrace.h        |   85 --
       fs/xfs/xfs.h                   |   16
       fs/xfs/xfs_ag.h                |   14
       fs/xfs/xfs_alloc.c             |  230 +-----
       fs/xfs/xfs_alloc.h             |   27
       fs/xfs/xfs_alloc_btree.c       |    1
       fs/xfs/xfs_attr.c              |  107 ---
       fs/xfs/xfs_attr.h              |   10
       fs/xfs/xfs_attr_leaf.c         |   14
       fs/xfs/xfs_attr_sf.h           |   40 -
       fs/xfs/xfs_bmap.c              |  507 +++------------
       fs/xfs/xfs_bmap.h              |   49 -
       fs/xfs/xfs_bmap_btree.c        |    6
       fs/xfs/xfs_btree.c             |    5
       fs/xfs/xfs_btree_trace.h       |   17
       fs/xfs/xfs_buf_item.c          |   87 --
       fs/xfs/xfs_buf_item.h          |   20
       fs/xfs/xfs_da_btree.c          |    3
       fs/xfs/xfs_da_btree.h          |    7
       fs/xfs/xfs_dfrag.c             |    2
       fs/xfs/xfs_dir2.c              |    8
       fs/xfs/xfs_dir2_block.c        |   20
       fs/xfs/xfs_dir2_leaf.c         |   21
       fs/xfs/xfs_dir2_node.c         |   27
       fs/xfs/xfs_dir2_sf.c           |   26
       fs/xfs/xfs_dir2_trace.c        |  216 ------
       fs/xfs/xfs_dir2_trace.h        |   72 --
       fs/xfs/xfs_filestream.c        |    8
       fs/xfs/xfs_fsops.c             |    2
       fs/xfs/xfs_iget.c              |  111 ---
       fs/xfs/xfs_inode.c             |   67 --
       fs/xfs/xfs_inode.h             |   76 --
       fs/xfs/xfs_inode_item.c        |    5
       fs/xfs/xfs_iomap.c             |   85 --
       fs/xfs/xfs_iomap.h             |    8
       fs/xfs/xfs_log.c               |  181 +----
       fs/xfs/xfs_log_priv.h          |   20
       fs/xfs/xfs_log_recover.c       |    1
       fs/xfs/xfs_mount.c             |    2
       fs/xfs/xfs_quota.h             |    8
       fs/xfs/xfs_rename.c            |    1
       fs/xfs/xfs_rtalloc.c           |    1
       fs/xfs/xfs_rw.c                |    3
       fs/xfs/xfs_trans.h             |   47 +
       fs/xfs/xfs_trans_buf.c         |   62 -
       fs/xfs/xfs_vnodeops.c          |    8
       70 files changed, 2151 insertions(+), 2592 deletions(-)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      0b1b213f
  22. 12 12月, 2009 1 次提交