1. 28 1月, 2011 1 次提交
  2. 04 1月, 2011 1 次提交
    • D
      xfs: dynamic speculative EOF preallocation · 055388a3
      Dave Chinner 提交于
      Currently the size of the speculative preallocation during delayed
      allocation is fixed by either the allocsize mount option of a
      default size. We are seeing a lot of cases where we need to
      recommend using the allocsize mount option to prevent fragmentation
      when buffered writes land in the same AG.
      
      Rather than using a fixed preallocation size by default (up to 64k),
      make it dynamic by basing it on the current inode size. That way the
      EOF preallocation will increase as the file size increases.  Hence
      for streaming writes we are much more likely to get large
      preallocations exactly when we need it to reduce fragementation.
      
      For default settings, the size of the initial extents is determined
      by the number of parallel writers and the amount of memory in the
      machine. For 4GB RAM and 4 concurrent 32GB file writes:
      
      EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET                 TOTAL
         0: [0..1048575]:         1048672..2097247      0 (1048672..2097247)      1048576
         1: [1048576..2097151]:   5242976..6291551      0 (5242976..6291551)      1048576
         2: [2097152..4194303]:   12583008..14680159    0 (12583008..14680159)    2097152
         3: [4194304..8388607]:   25165920..29360223    0 (25165920..29360223)    4194304
         4: [8388608..16777215]:  58720352..67108959    0 (58720352..67108959)    8388608
         5: [16777216..33554423]: 117440584..134217791  0 (117440584..134217791) 16777208
         6: [33554424..50331511]: 184549056..201326143  0 (184549056..201326143) 16777088
         7: [50331512..67108599]: 251657408..268434495  0 (251657408..268434495) 16777088
      
      and for 16 concurrent 16GB file writes:
      
       EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET                 TOTAL
         0: [0..262143]:          2490472..2752615      0 (2490472..2752615)       262144
         1: [262144..524287]:     6291560..6553703      0 (6291560..6553703)       262144
         2: [524288..1048575]:    13631592..14155879    0 (13631592..14155879)     524288
         3: [1048576..2097151]:   30408808..31457383    0 (30408808..31457383)    1048576
         4: [2097152..4194303]:   52428904..54526055    0 (52428904..54526055)    2097152
         5: [4194304..8388607]:   104857704..109052007  0 (104857704..109052007)  4194304
         6: [8388608..16777215]:  209715304..218103911  0 (209715304..218103911)  8388608
         7: [16777216..33554423]: 452984848..469762055  0 (452984848..469762055) 16777208
      
      Because it is hard to take back specualtive preallocation, cases
      where there are large slow growing log files on a nearly full
      filesystem may cause premature ENOSPC. Hence as the filesystem nears
      full, the maximum dynamic prealloc size іs reduced according to this
      table (based on 4k block size):
      
      freespace       max prealloc size
        >5%             full extent (8GB)
        4-5%             2GB (8GB >> 2)
        3-4%             1GB (8GB >> 3)
        2-3%           512MB (8GB >> 4)
        1-2%           256MB (8GB >> 5)
        <1%            128MB (8GB >> 6)
      
      This should reduce the amount of space held in speculative
      preallocation for such cases.
      
      The allocsize mount option turns off the dynamic behaviour and fixes
      the prealloc size to whatever the mount option specifies. i.e. the
      behaviour is unchanged.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      055388a3
  3. 17 12月, 2010 2 次提交
  4. 27 7月, 2010 6 次提交
  5. 19 5月, 2010 7 次提交
  6. 15 12月, 2009 1 次提交
    • C
      xfs: event tracing support · 0b1b213f
      Christoph Hellwig 提交于
      Convert the old xfs tracing support that could only be used with the
      out of tree kdb and xfsidbg patches to use the generic event tracer.
      
      To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
      all xfs trace channels by:
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/enable
      
      or alternatively enable single events by just doing the same in one
      event subdirectory, e.g.
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable
      
      or set more complex filters, etc. In Documentation/trace/events.txt
      all this is desctribed in more detail.  To reads the events do a
      
         cat /sys/kernel/debug/tracing/trace
      
      Compared to the last posting this patch converts the tracing mostly to
      the one tracepoint per callsite model that other users of the new
      tracing facility also employ.  This allows a very fine-grained control
      of the tracing, a cleaner output of the traces and also enables the
      perf tool to use each tracepoint as a virtual performance counter,
           allowing us to e.g. count how often certain workloads git various
           spots in XFS.  Take a look at
      
          http://lwn.net/Articles/346470/
      
      for some examples.
      
      Also the btree tracing isn't included at all yet, as it will require
      additional core tracing features not in mainline yet, I plan to
      deliver it later.
      
      And the really nice thing about this patch is that it actually removes
      many lines of code while adding this nice functionality:
      
       fs/xfs/Makefile                |    8
       fs/xfs/linux-2.6/xfs_acl.c     |    1
       fs/xfs/linux-2.6/xfs_aops.c    |   52 -
       fs/xfs/linux-2.6/xfs_aops.h    |    2
       fs/xfs/linux-2.6/xfs_buf.c     |  117 +--
       fs/xfs/linux-2.6/xfs_buf.h     |   33
       fs/xfs/linux-2.6/xfs_fs_subr.c |    3
       fs/xfs/linux-2.6/xfs_ioctl.c   |    1
       fs/xfs/linux-2.6/xfs_ioctl32.c |    1
       fs/xfs/linux-2.6/xfs_iops.c    |    1
       fs/xfs/linux-2.6/xfs_linux.h   |    1
       fs/xfs/linux-2.6/xfs_lrw.c     |   87 --
       fs/xfs/linux-2.6/xfs_lrw.h     |   45 -
       fs/xfs/linux-2.6/xfs_super.c   |  104 ---
       fs/xfs/linux-2.6/xfs_super.h   |    7
       fs/xfs/linux-2.6/xfs_sync.c    |    1
       fs/xfs/linux-2.6/xfs_trace.c   |   75 ++
       fs/xfs/linux-2.6/xfs_trace.h   | 1369 +++++++++++++++++++++++++++++++++++++++++
       fs/xfs/linux-2.6/xfs_vnode.h   |    4
       fs/xfs/quota/xfs_dquot.c       |  110 ---
       fs/xfs/quota/xfs_dquot.h       |   21
       fs/xfs/quota/xfs_qm.c          |   40 -
       fs/xfs/quota/xfs_qm_syscalls.c |    4
       fs/xfs/support/ktrace.c        |  323 ---------
       fs/xfs/support/ktrace.h        |   85 --
       fs/xfs/xfs.h                   |   16
       fs/xfs/xfs_ag.h                |   14
       fs/xfs/xfs_alloc.c             |  230 +-----
       fs/xfs/xfs_alloc.h             |   27
       fs/xfs/xfs_alloc_btree.c       |    1
       fs/xfs/xfs_attr.c              |  107 ---
       fs/xfs/xfs_attr.h              |   10
       fs/xfs/xfs_attr_leaf.c         |   14
       fs/xfs/xfs_attr_sf.h           |   40 -
       fs/xfs/xfs_bmap.c              |  507 +++------------
       fs/xfs/xfs_bmap.h              |   49 -
       fs/xfs/xfs_bmap_btree.c        |    6
       fs/xfs/xfs_btree.c             |    5
       fs/xfs/xfs_btree_trace.h       |   17
       fs/xfs/xfs_buf_item.c          |   87 --
       fs/xfs/xfs_buf_item.h          |   20
       fs/xfs/xfs_da_btree.c          |    3
       fs/xfs/xfs_da_btree.h          |    7
       fs/xfs/xfs_dfrag.c             |    2
       fs/xfs/xfs_dir2.c              |    8
       fs/xfs/xfs_dir2_block.c        |   20
       fs/xfs/xfs_dir2_leaf.c         |   21
       fs/xfs/xfs_dir2_node.c         |   27
       fs/xfs/xfs_dir2_sf.c           |   26
       fs/xfs/xfs_dir2_trace.c        |  216 ------
       fs/xfs/xfs_dir2_trace.h        |   72 --
       fs/xfs/xfs_filestream.c        |    8
       fs/xfs/xfs_fsops.c             |    2
       fs/xfs/xfs_iget.c              |  111 ---
       fs/xfs/xfs_inode.c             |   67 --
       fs/xfs/xfs_inode.h             |   76 --
       fs/xfs/xfs_inode_item.c        |    5
       fs/xfs/xfs_iomap.c             |   85 --
       fs/xfs/xfs_iomap.h             |    8
       fs/xfs/xfs_log.c               |  181 +----
       fs/xfs/xfs_log_priv.h          |   20
       fs/xfs/xfs_log_recover.c       |    1
       fs/xfs/xfs_mount.c             |    2
       fs/xfs/xfs_quota.h             |    8
       fs/xfs/xfs_rename.c            |    1
       fs/xfs/xfs_rtalloc.c           |    1
       fs/xfs/xfs_rw.c                |    3
       fs/xfs/xfs_trans.h             |   47 +
       fs/xfs/xfs_trans_buf.c         |   62 -
       fs/xfs/xfs_vnodeops.c          |    8
       70 files changed, 2151 insertions(+), 2592 deletions(-)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      0b1b213f
  7. 12 12月, 2009 1 次提交
  8. 10 6月, 2009 1 次提交
  9. 08 6月, 2009 1 次提交
    • C
      xfs: kill xfs_qmops · 7d095257
      Christoph Hellwig 提交于
      Kill the quota ops function vector and replace it with direct calls or
      stubs in the CONFIG_XFS_QUOTA=n case.
      
      Make sure we check XFS_IS_QUOTA_RUNNING in the right spots.  We can remove
      the number of those checks because the XFS_TRANS_DQ_DIRTY flag can't be set
      otherwise.
      
      This brings us back closer to the way this code worked in IRIX and earlier
      Linux versions, but we keep a lot of the more useful factoring of common
      code.
      
      Eventually we should also kill xfs_qm_bhv.c, but that's left for a later
      patch.
      
      Reduces the size of the source code by about 250 lines and the size of
      XFS module by about 1.5 kilobytes with quotas enabled:
      
         text	   data	    bss	    dec	    hex	filename
       615957	   2960	   3848	 622765	  980ad	fs/xfs/xfs.o
       617231	   3152	   3848	 624231	  98667	fs/xfs/xfs.o.old
      
      Fallout:
      
       - xfs_qm_dqattach is split into xfs_qm_dqattach_locked which expects
         the inode locked and xfs_qm_dqattach which does the locking around it,
         thus removing XFS_QMOPT_ILOCKED.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
      7d095257
  10. 07 4月, 2009 3 次提交
    • D
      xfs: remove xfs_flush_space · 8de2bf93
      Dave Chinner 提交于
      The only thing we need to do now when we get an ENOSPC condition during delayed
      allocation reservation is flush all the other inodes with delalloc blocks on
      them and retry without EOF preallocation. Remove the unneeded mess that is
      xfs_flush_space() and just call xfs_flush_inodes() directly from
      xfs_iomap_write_delay().
      
      Also, change the location of the retry label to avoid trying to do EOF
      preallocation because we don't want to do that at ENOSPC. This enables us to
      remove the BMAPI_SYNC flag as it is no longer used.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      8de2bf93
    • D
      xfs: make inode flush at ENOSPC synchronous · 5825294e
      Dave Chinner 提交于
      When we are writing to a single file and hit ENOSPC, we trigger a background
      flush of the inode and try again.  Because we hold page locks and the iolock,
      the flush won't proceed until after we release these locks. This occurs once
      we've given up and ENOSPC has been reported. Hence if this one is the only
      dirty inode in the system, we'll get an ENOSPC prematurely.
      
      To fix this, remove the async flush from the allocation routines and move
      it to the top of the write path where we can do a synchronous flush
      and retry the write again. Only retry once as a second ENOSPC indicates
      that we really are ENOSPC.
      
      This avoids a page cache deadlock when trying to do this flush synchronously
      in the allocation layer that was identified by Mikulas Patocka.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5825294e
    • D
      xfs: use xfs_sync_inodes() for device flushing · a8d770d9
      Dave Chinner 提交于
      Currently xfs_device_flush calls sync_blockdev() which is
      a no-op for XFS as all it's metadata is held in a different
      address to the one sync_blockdev() works on.
      
      Call xfs_sync_inodes() instead to flush all the delayed
      allocation blocks out. To do this as efficiently as possible,
      do it via two passes - one to do an async flush of all the
      dirty blocks and a second to wait for all the IO to complete.
      This requires some modification to the xfs-sync_inodes_ag()
      flush code to do efficiently.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      a8d770d9
  11. 19 1月, 2009 1 次提交
  12. 16 1月, 2009 1 次提交
  13. 22 12月, 2008 1 次提交
    • L
      [XFS] Fix speculative allocation beyond eof · 9f6c92b9
      Lachlan McIlroy 提交于
      Speculative allocation beyond eof doesn't work properly.  It was
      broken some time ago after a code cleanup that moved what is now
      xfs_iomap_eof_align_last_fsb() and xfs_iomap_eof_want_preallocate()
      out of xfs_iomap_write_delay() into separate functions.  The code
      used to use the current file size in various checks but got changed
      to be max(file_size, i_new_size).  Since i_new_size is the result
      of 'offset + count' then in xfs_iomap_eof_want_preallocate() the
      check for '(offset + count) <= isize' will always be true.
      
      ie if 'offset + count' is > ip->i_size then isize will be i_new_size
      and equal to 'offset + count'.
      
      This change fixes all the places that used to use the current file
      size.
      Reviewed-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      9f6c92b9
  14. 28 7月, 2008 1 次提交
    • L
      [XFS] use minleft when allocating in xfs_bmbt_split() · 4ddd8bb1
      Lachlan McIlroy 提交于
      The bmap btree split code relies on a previous data extent allocation
      (from xfs_bmap_btalloc()) to find an AG that has sufficient space to
      perform a full btree split, when inserting the extent. When converting
      unwritten extents we don't allocate a data extent so a btree split will be
      the first allocation. In this case we need to set minleft so the allocator
      will pick an AG that has space to complete the split(s).
      
      SGI-PV: 983338
      
      SGI-Modid: xfs-linux-melb:xfs-kern:31357a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      4ddd8bb1
  15. 29 4月, 2008 2 次提交
  16. 18 4月, 2008 1 次提交
  17. 07 2月, 2008 6 次提交
    • E
      [XFS] optimize XFS_IS_REALTIME_INODE w/o realtime config · 71ddabb9
      Eric Sandeen 提交于
      Use XFS_IS_REALTIME_INODE in more places, and #define it to 0 if
      CONFIG_XFS_RT is off. This should be safe because mount checks in
      xfs_rtmount_init:
      
      so if we get mounted w/o CONFIG_XFS_RT, no realtime inodes should be
      encountered after that.
      
      Defining XFS_IS_REALTIME_INODE to 0 saves a bit of stack space,
      presumeably gcc can optimize around the various "if (0)" type checks:
      
      xfs_alloc_file_space -8 xfs_bmap_adjacent -16 xfs_bmapi -8
      xfs_bmap_rtalloc -16 xfs_bunmapi -28 xfs_free_file_space -64 xfs_imap +8
      <-- ? hmm. xfs_iomap_write_direct -12 xfs_qm_dqusage_adjust -4
      xfs_qm_vop_chown_reserve -4
      
      SGI-PV: 971186
      SGI-Modid: xfs-linux-melb:xfs-kern:30014a
      Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      71ddabb9
    • D
      [XFS] Fix transaction overrun during writeback. · e4143a1c
      David Chinner 提交于
      Prevent transaction overrun in xfs_iomap_write_allocate() if we race with
      a truncate that overlaps the delalloc range we were planning to allocate.
      
      If we race, we may allocate into a hole and that requires block
      allocation. At this point in time we don't have a reservation for block
      allocation (apart from metadata blocks) and so allocating into a hole
      rather than a delalloc region results in overflowing the transaction block
      reservation.
      
      Fix it by only allowing a single extent to be allocated at a time.
      
      SGI-PV: 972757
      SGI-Modid: xfs-linux-melb:xfs-kern:30005a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      e4143a1c
    • C
      [XFS] kill xfs_iocore_t · 613d7043
      Christoph Hellwig 提交于
      xfs_iocore_t is a structure embedded in xfs_inode. Except for one field it
      just duplicates fields already in xfs_inode, and there is nothing this
      abstraction buys us on XFS/Linux. This patch removes it and shrinks source
      and binary size of xfs aswell as shrinking the size of xfs_inode by 60/44
      bytes in debug/non-debug builds.
      
      SGI-PV: 970852
      SGI-Modid: xfs-linux-melb:xfs-kern:29754a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      613d7043
    • L
      [XFS] kill unnessecary ioops indirection · 541d7d3c
      Lachlan McIlroy 提交于
      Currently there is an indirection called ioops in the XFS data I/O path.
      Various functions are called by functions pointers, but there is no
      coherence in what this is for, and of course for XFS itself it's entirely
      unused. This patch removes it instead and significantly reduces source and
      binary size of XFS while making maintaince easier.
      
      SGI-PV: 970841
      SGI-Modid: xfs-linux-melb:xfs-kern:29737a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      541d7d3c
    • C
      [XFS] kill BMAPI_UNWRITTEN · 7642861b
      Christoph Hellwig 提交于
      There is no reason to go through xfs_iomap for the BMAPI_UNWRITTEN because
      it has nothing in common with the other cases. Instead check for the
      shutdown filesystem in xfs_end_bio_unwritten and perform a direct call to
      xfs_iomap_write_unwritten (which should be renamed to something more
      sensible one day)
      
      SGI-PV: 970241
      SGI-Modid: xfs-linux-melb:xfs-kern:29681a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      7642861b
    • C
      [XFS] kill BMAPI_DEVICE · 6214ed44
      Christoph Hellwig 提交于
      There is no reason to go into the iomap machinery just to get the right
      block device for an inode. Instead look at the realtime flag in the inode
      and grab the right device from the mount structure.
      
      I created a new helper, xfs_find_bdev_for_inode instead of opencoding it
      because I plan to use it in other places in the future.
      
      SGI-PV: 970240
      SGI-Modid: xfs-linux-melb:xfs-kern:29680a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      6214ed44
  18. 16 10月, 2007 1 次提交
  19. 14 7月, 2007 2 次提交
    • D
      [XFS] Cleanup inode extent size hint extraction · 957d0ebe
      David Chinner 提交于
      SGI-PV: 966004
      SGI-Modid: xfs-linux-melb:xfs-kern:28866a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      957d0ebe
    • D
      [XFS] Prevent ENOSPC from aborting transactions that need to succeed · 84e1e99f
      David Chinner 提交于
      During delayed allocation extent conversion or unwritten extent
      conversion, we need to reserve some blocks for transactions reservations.
      We need to reserve these blocks in case a btree split occurs and we need
      to allocate some blocks.
      
      Unfortunately, we've only ever reserved the number of data blocks we are
      allocating, so in both the unwritten and delalloc case we can get ENOSPC
      to the transaction reservation. This is bad because in both cases we
      cannot report the failure to the writing application.
      
      The fix is two-fold:
      
      1 - leverage the reserved block infrastructure XFS already
      has to reserve a small pool of blocks by default to allow
      specially marked transactions to dip into when we are at
      ENOSPC.
      Default setting is min(5%, 1024 blocks).
      
      2 - convert critical transaction reservations to be allowed
      to dip into this pool. Spots changed are delalloc
      conversion, unwritten extent conversion and growing a
      filesystem at ENOSPC.
      This also allows growing the filesytsem to succeed at ENOSPC.
      
      SGI-PV: 964468
      SGI-Modid: xfs-linux-melb:xfs-kern:28865a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      84e1e99f