1. 27 7月, 2010 1 次提交
    • C
      direct-io: move aio_complete into ->end_io · 552ef802
      Christoph Hellwig 提交于
      Filesystems with unwritten extent support must not complete an AIO request
      until the transaction to convert the extent has been commited.  That means
      the aio_complete calls needs to be moved into the ->end_io callback so
      that the filesystem can control when to call it exactly.
      
      This makes a bit of a mess out of dio_complete and the ->end_io callback
      prototype even more complicated. 
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: Jan Kara <jack@suse.cz> 
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      552ef802
  2. 09 6月, 2010 1 次提交
  3. 03 6月, 2010 1 次提交
    • C
      xfs: skip writeback from reclaim context · 070ecdca
      Christoph Hellwig 提交于
      Allowing writeback from reclaim context causes massive problems with stack
      overflows as we can call into the writeback code which tends to be a heavy
      stack user both in the generic code and XFS from random contexts that
      perform memory allocations.
      
      Follow the example of btrfs (and in slightly different form ext4) and refuse
      to write out data from reclaim context.  This issue should really be handled
      by the VM so that we can tune better for this case, but until we get it
      sorted out there we have to hack around this in each filesystem with a
      complex writeback path.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      
      070ecdca
  4. 19 5月, 2010 9 次提交
  5. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  6. 17 3月, 2010 1 次提交
  7. 06 3月, 2010 2 次提交
  8. 02 3月, 2010 2 次提交
    • D
      xfs: Non-blocking inode locking in IO completion · 77d7a0c2
      Dave Chinner 提交于
      The introduction of barriers to loop devices has created a new IO
      order completion dependency that XFS does not handle. The loop
      device implements barriers using fsync and so turns a log IO in the
      XFS filesystem on the loop device into a data IO in the backing
      filesystem. That is, the completion of log IOs in the loop
      filesystem are now dependent on completion of data IO in the backing
      filesystem.
      
      This can cause deadlocks when a flush daemon issues a log force with
      an inode locked because the IO completion of IO on the inode is
      blocked by the inode lock. This in turn prevents further data IO
      completion from occuring on all XFS filesystems on that CPU (due to
      the shared nature of the completion queues). This then prevents the
      log IO from completing because the log is waiting for data IO
      completion as well.
      
      The fix for this new completion order dependency issue is to make
      the IO completion inode locking non-blocking. If the inode lock
      can't be grabbed, simply requeue the IO completion back to the work
      queue so that it can be processed later. This prevents the
      completion queue from being blocked and allows data IO completion on
      other inodes to proceed, hence avoiding completion order dependent
      deadlocks.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      77d7a0c2
    • C
      xfs: implement optimized fdatasync · 66d834ea
      Christoph Hellwig 提交于
      Allow us to track the difference between timestamp and size updates
      by using mark_inode_dirty from the I/O completion code, and checking
      the VFS inode flags in xfs_file_fsync.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      66d834ea
  9. 17 12月, 2009 1 次提交
    • C
      cleanup blockdev_direct_IO locking · 1e431f5c
      Christoph Hellwig 提交于
      Currently the locking in blockdev_direct_IO is a mess, we have three different
      locking types and very confusing checks for some of them.  The most
      complicated one is DIO_OWN_LOCKING for reads, which happens to not actually be
      used.
      
      This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read case
      is unused anyway, and the write side is almost identical to DIO_NO_LOCKING.
      The difference is that DIO_NO_LOCKING always sets the create argument for
      the get_blocks callback to zero, but we can easily move that to the actual
      get_blocks callbacks.  There are four users of the DIO_NO_LOCKING mode:
      gfs already ignores the create argument and thus is fine with the new
      version, ocfs2 only errors out if create were ever set, and we can remove
      this dead code now, the block device code only ever uses create for an
      error message if we are fully beyond the device which can never happen,
      and last but not least XFS will need the new behavour for writes.
      
      Now we can replace the lock_type variable with a flags one, where no flag
      means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first
      flag.  Separate out the check for not allowing to fill holes into a separate
      flag, although for now both flags always get set at the same time.
      
      Also revamp the documentation of the locking scheme to actually make sense.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1e431f5c
  10. 16 12月, 2009 1 次提交
    • C
      direct-io: cleanup blockdev_direct_IO locking · 5fe878ae
      Christoph Hellwig 提交于
      Currently the locking in blockdev_direct_IO is a mess, we have three
      different locking types and very confusing checks for some of them.  The
      most complicated one is DIO_OWN_LOCKING for reads, which happens to not
      actually be used.
      
      This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read
      case is unused anyway, and the write side is almost identical to
      DIO_NO_LOCKING.  The difference is that DIO_NO_LOCKING always sets the
      create argument for the get_blocks callback to zero, but we can easily
      move that to the actual get_blocks callbacks.  There are four users of the
      DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is
      fine with the new version, ocfs2 only errors out if create were ever set,
      and we can remove this dead code now, the block device code only ever uses
      create for an error message if we are fully beyond the device which can
      never happen, and last but not least XFS will need the new behavour for
      writes.
      
      Now we can replace the lock_type variable with a flags one, where no flag
      means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first
      flag.  Separate out the check for not allowing to fill holes into a
      separate flag, although for now both flags always get set at the same
      time.
      
      Also revamp the documentation of the locking scheme to actually make
      sense.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5fe878ae
  11. 15 12月, 2009 1 次提交
    • C
      xfs: event tracing support · 0b1b213f
      Christoph Hellwig 提交于
      Convert the old xfs tracing support that could only be used with the
      out of tree kdb and xfsidbg patches to use the generic event tracer.
      
      To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
      all xfs trace channels by:
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/enable
      
      or alternatively enable single events by just doing the same in one
      event subdirectory, e.g.
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable
      
      or set more complex filters, etc. In Documentation/trace/events.txt
      all this is desctribed in more detail.  To reads the events do a
      
         cat /sys/kernel/debug/tracing/trace
      
      Compared to the last posting this patch converts the tracing mostly to
      the one tracepoint per callsite model that other users of the new
      tracing facility also employ.  This allows a very fine-grained control
      of the tracing, a cleaner output of the traces and also enables the
      perf tool to use each tracepoint as a virtual performance counter,
           allowing us to e.g. count how often certain workloads git various
           spots in XFS.  Take a look at
      
          http://lwn.net/Articles/346470/
      
      for some examples.
      
      Also the btree tracing isn't included at all yet, as it will require
      additional core tracing features not in mainline yet, I plan to
      deliver it later.
      
      And the really nice thing about this patch is that it actually removes
      many lines of code while adding this nice functionality:
      
       fs/xfs/Makefile                |    8
       fs/xfs/linux-2.6/xfs_acl.c     |    1
       fs/xfs/linux-2.6/xfs_aops.c    |   52 -
       fs/xfs/linux-2.6/xfs_aops.h    |    2
       fs/xfs/linux-2.6/xfs_buf.c     |  117 +--
       fs/xfs/linux-2.6/xfs_buf.h     |   33
       fs/xfs/linux-2.6/xfs_fs_subr.c |    3
       fs/xfs/linux-2.6/xfs_ioctl.c   |    1
       fs/xfs/linux-2.6/xfs_ioctl32.c |    1
       fs/xfs/linux-2.6/xfs_iops.c    |    1
       fs/xfs/linux-2.6/xfs_linux.h   |    1
       fs/xfs/linux-2.6/xfs_lrw.c     |   87 --
       fs/xfs/linux-2.6/xfs_lrw.h     |   45 -
       fs/xfs/linux-2.6/xfs_super.c   |  104 ---
       fs/xfs/linux-2.6/xfs_super.h   |    7
       fs/xfs/linux-2.6/xfs_sync.c    |    1
       fs/xfs/linux-2.6/xfs_trace.c   |   75 ++
       fs/xfs/linux-2.6/xfs_trace.h   | 1369 +++++++++++++++++++++++++++++++++++++++++
       fs/xfs/linux-2.6/xfs_vnode.h   |    4
       fs/xfs/quota/xfs_dquot.c       |  110 ---
       fs/xfs/quota/xfs_dquot.h       |   21
       fs/xfs/quota/xfs_qm.c          |   40 -
       fs/xfs/quota/xfs_qm_syscalls.c |    4
       fs/xfs/support/ktrace.c        |  323 ---------
       fs/xfs/support/ktrace.h        |   85 --
       fs/xfs/xfs.h                   |   16
       fs/xfs/xfs_ag.h                |   14
       fs/xfs/xfs_alloc.c             |  230 +-----
       fs/xfs/xfs_alloc.h             |   27
       fs/xfs/xfs_alloc_btree.c       |    1
       fs/xfs/xfs_attr.c              |  107 ---
       fs/xfs/xfs_attr.h              |   10
       fs/xfs/xfs_attr_leaf.c         |   14
       fs/xfs/xfs_attr_sf.h           |   40 -
       fs/xfs/xfs_bmap.c              |  507 +++------------
       fs/xfs/xfs_bmap.h              |   49 -
       fs/xfs/xfs_bmap_btree.c        |    6
       fs/xfs/xfs_btree.c             |    5
       fs/xfs/xfs_btree_trace.h       |   17
       fs/xfs/xfs_buf_item.c          |   87 --
       fs/xfs/xfs_buf_item.h          |   20
       fs/xfs/xfs_da_btree.c          |    3
       fs/xfs/xfs_da_btree.h          |    7
       fs/xfs/xfs_dfrag.c             |    2
       fs/xfs/xfs_dir2.c              |    8
       fs/xfs/xfs_dir2_block.c        |   20
       fs/xfs/xfs_dir2_leaf.c         |   21
       fs/xfs/xfs_dir2_node.c         |   27
       fs/xfs/xfs_dir2_sf.c           |   26
       fs/xfs/xfs_dir2_trace.c        |  216 ------
       fs/xfs/xfs_dir2_trace.h        |   72 --
       fs/xfs/xfs_filestream.c        |    8
       fs/xfs/xfs_fsops.c             |    2
       fs/xfs/xfs_iget.c              |  111 ---
       fs/xfs/xfs_inode.c             |   67 --
       fs/xfs/xfs_inode.h             |   76 --
       fs/xfs/xfs_inode_item.c        |    5
       fs/xfs/xfs_iomap.c             |   85 --
       fs/xfs/xfs_iomap.h             |    8
       fs/xfs/xfs_log.c               |  181 +----
       fs/xfs/xfs_log_priv.h          |   20
       fs/xfs/xfs_log_recover.c       |    1
       fs/xfs/xfs_mount.c             |    2
       fs/xfs/xfs_quota.h             |    8
       fs/xfs/xfs_rename.c            |    1
       fs/xfs/xfs_rtalloc.c           |    1
       fs/xfs/xfs_rw.c                |    3
       fs/xfs/xfs_trans.h             |   47 +
       fs/xfs/xfs_trans_buf.c         |   62 -
       fs/xfs/xfs_vnodeops.c          |    8
       70 files changed, 2151 insertions(+), 2592 deletions(-)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      0b1b213f
  12. 12 12月, 2009 3 次提交
  13. 03 12月, 2009 1 次提交
    • W
      writeback: remove unused nonblocking and congestion checks · 0d99519e
      Wu Fengguang 提交于
      - no one is calling wb_writeback and write_cache_pages with
        wbc.nonblocking=1 any more
      - lumpy pageout will want to do nonblocking writeback without the
        congestion wait
      
      So remove the congestion checks as suggested by Chris.
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Evgeniy Polyakov <zbr@ioremap.net>
      Cc: Alex Elder <aelder@sgi.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      0d99519e
  14. 09 10月, 2009 2 次提交
    • D
      xfs: mark inodes dirty before issuing I/O · 932640e8
      Dave Chinner 提交于
      To make sure they get properly waited on in sync when I/O is in flight and
      we latter need to update the inode size.  Requires a new helper to check if an
      ioend structure is beyond the current EOF.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      932640e8
    • C
      xfs: implement ->dirty_inode to fix timestamp handling · f9581b14
      Christoph Hellwig 提交于
      This is picking up on Felix's repost of Dave's patch to implement a
      .dirty_inode method.  We really need this notification because
      the VFS keeps writing directly into the inode structure instead
      of going through methods to update this state.  In addition to
      the long-known atime issue we now also have a caller in VM code
      that updates c/mtime that way for shared writeable mmaps.  And
      I found another one that no one has noticed in practice in the FIFO
      code.
      
      So implement ->dirty_inode to set i_update_core whenever the
      inode gets externally dirtied, and switch the c/mtime handling to
      the same scheme we already use for atime (always picking up
      the value from the Linux inode).
      
      Note that this patch also removes the xfs_synchronize_atime call
      in xfs_reclaim it was superflous as we already synchronize the time
      when writing the inode via the log (xfs_inode_item_format) or the
      normal buffers (xfs_iflush_int).
      
      In addition also remove the I_CLEAR check before copying the Linux
      timestamps - now that we always have the Linux inode available
      we can always use the timestamps in it.
      
      Also switch to just using file_update_time for regular reads/writes -
      that will get us all optimization done to it for free and make
      sure we notice early when it breaks.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NFelix Blyakher <felixb@sgi.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      f9581b14
  15. 16 9月, 2009 1 次提交
    • A
      HWPOISON: Enable .remove_error_page for migration aware file systems · aa261f54
      Andi Kleen 提交于
      Enable removing of corrupted pages through truncation
      for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
      These should cover most server needs.
      
      I chose the set of migration aware file systems for this
      for now, assuming they have been especially audited.
      But in general it should be safe for all file systems
      on the data area that support read/write and truncate.
      
      Caveat: the hardware error handler does not take i_mutex
      for now before calling the truncate function. Is that ok?
      
      Cc: tytso@mit.edu
      Cc: hch@infradead.org
      Cc: mfasheh@suse.com
      Cc: aia21@cantab.net
      Cc: hugh.dickins@tiscali.co.uk
      Cc: swhiteho@redhat.com
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      aa261f54
  16. 02 9月, 2009 1 次提交
    • C
      xfs: merge fsync and O_SYNC handling · 13e6d5cd
      Christoph Hellwig 提交于
      The guarantees for O_SYNC are exactly the same as the ones we need to
      make for an fsync call (and given that Linux O_SYNC is O_DSYNC the
      equivalent is fdadatasync, but we treat both the same in XFS), except
      with a range data writeout.  Jan Kara has started unifying these two
      path for filesystems using the generic helpers, and I've started to
      look at XFS.
      
      The actual transaction commited by xfs_fsync and xfs_write_sync_logforce
      has a different transaction number, but actually is exactly the same.
      We'll only use the fsync transaction going forward.  One major difference
      is that xfs_write_sync_logforce never issues a cache flush unless we
      commit a transaction causing that as a side-effect, which is an obvious
      bug in the O_SYNC handling.  Second all the locking and i_update_size
      vs i_update_core changes from 978b7237
      never made it to xfs_write_sync_logforce, so we add them back.
      
      To make xfs_fsync easily usable from the O_SYNC path, the filemap_fdatawait
      call is moved up to xfs_file_fsync, so that we don't wait on the whole
      file after we already waited for our portion in xfs_write.
      
      We'll also use a plain call to filemap_write_and_wait_range instead
      of the previous sync_page_rang which did it in two steps including
      an half-hearted inode write out that doesn't help us.
      
      Once we're done with this also remove the now useless i_update_size
      tracking.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NFelix Blyakher <felixb@sgi.com>
      Signed-off-by: NFelix Blyakher <felixb@sgi.com>
      13e6d5cd
  17. 31 7月, 2009 2 次提交
  18. 07 4月, 2009 1 次提交
  19. 29 3月, 2009 1 次提交
    • H
      xfs: pagecache usage optimization · bddaafa1
      Hisashi Hifumi 提交于
      Hi.
      
      I introduced "is_partially_uptodate" aops for XFS.
      
      A page can have multiple buffers and even if a page is not uptodate,
      some buffers can be uptodate on pagesize != blocksize environment.
      
      This aops checks that all buffers which correspond to a part of a file
      that we want to read are uptodate. If so, we do not have to issue actual
      read IO to HDD even if a page is not uptodate because the portion we
      want to read are uptodate.
      
      "block_is_partially_uptodate" function is already used by ext2/3/4.
      With the following patch random read/write mixed workloads or random read
      after random write workloads can be optimized and we can get performance
      improvement.
      
      I did a performance test using the sysbench.
      
      #sysbench --num-threads=4 --max-requests=100000 --test=fileio --file-num=1 \
      --file-block-size=8K --file-total-size=1G --file-test-mode=rndrw \
      --file-fsync-freq=0 --file-rw-ratio=0.5 run
      
      -2.6.29-rc6
      Test execution summary:
          total time:                          123.8645s
          total number of events:              100000
          total time taken by event execution: 442.4994
          per-request statistics:
               min:                            0.0000s
               avg:                            0.0044s
               max:                            0.3387s
               approx.  95 percentile:         0.0118s
      
      -2.6.29-rc6-patched
      Test execution summary:
          total time:                          108.0757s
          total number of events:              100000
          total time taken by event execution: 417.7505
          per-request statistics:
               min:                            0.0000s
               avg:                            0.0042s
               max:                            0.3217s
               approx.  95 percentile:         0.0118s
      
      arch: ia64
      pagesize: 16k
      blocksize: 4k
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NFelix Blyakher <felixb@sgi.com>
      bddaafa1
  20. 04 12月, 2008 3 次提交
  21. 30 10月, 2008 1 次提交
  22. 17 9月, 2008 1 次提交
    • L
      [XFS] Prevent direct I/O from mapping extents beyond eof · 364f358a
      Lachlan McIlroy 提交于
      With the help from some tracing I found that we try to map extents beyond
      eof when doing a direct I/O read. It appears that the way to inform the
      generic direct I/O path (ie do_direct_IO()) that we have breached eof is
      to return an unmapped buffer from xfs_get_blocks_direct(). This will cause
      do_direct_IO() to jump to the hole handling code where is will check for
      eof and then abort.
      
      This problem was found because a direct I/O read was trying to map beyond
      eof and was encountering delayed allocations. The delayed allocations
      beyond eof are speculative allocations and they didn't get converted when
      the direct I/O flushed the file because there was only enough space in the
      current AG to convert and write out the dirty pages within eof. Note that
      xfs_iomap_write_allocate() wont necessarily convert all the delayed
      allocation passed to it - it will return after allocating the first extent
      - so if the delayed allocation extends beyond eof then it will stay that
      way.
      
      SGI-PV: 983683
      
      SGI-Modid: xfs-linux-melb:xfs-kern:31929a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      364f358a
  23. 13 8月, 2008 1 次提交
  24. 05 8月, 2008 1 次提交