1. 01 12月, 2010 1 次提交
    • D
      xfs: fix failed write truncation handling. · c726de44
      Dave Chinner 提交于
      Since the move to the new truncate sequence we call xfs_setattr to
      truncate down excessively instanciated blocks.  As shown by the testcase
      in kernel.org BZ #22452 that doesn't work too well.  Due to the confusion
      of the internal inode size, and the VFS inode i_size it zeroes data that
      it shouldn't.
      
      But full blown truncate seems like overkill here.  We only instanciate
      delayed allocations in the write path, and given that we never released
      the iolock we can't have converted them to real allocations yet either.
      
      The only nasty case is pre-existing preallocation which we need to skip.
      We already do this for page discard during writeback, so make the delayed
      allocation block punching a generic function and call it from the failed
      write path as well as xfs_aops_discard_page. The callers are
      responsible for ensuring that partial blocks are not truncated away,
      and that they hold the ilock.
      
      Based on a fix originally from Christoph Hellwig. This version used
      filesystem blocks as the range unit.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c726de44
  2. 11 11月, 2010 1 次提交
    • C
      xfs: remove incorrect assert in xfs_vm_writepage · ece413f5
      Christoph Hellwig 提交于
      In commit 20cb52eb, titled
      "xfs: simplify xfs_vm_writepage" I added an assert that any !mapped and
      uptodate buffers are not dirty.  That asserts turns out to trigger a lot
      when running fsx on filesystems with small block sizes.  The reason for
      that is that the assert is simply incorrect.  !mapped and uptodate
      just mean this buffer covers a hole, and whenever we do a set_page_dirty
      we mark all blocks in the page dirty, no matter if they have data or
      not.  So remove the assert, and update the comment above the condition
      to match reality.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      ece413f5
  3. 27 10月, 2010 1 次提交
    • W
      writeback: remove nonblocking/encountered_congestion references · 1b430bee
      Wu Fengguang 提交于
      This removes more dead code that was somehow missed by commit 0d99519e
      (writeback: remove unused nonblocking and congestion checks).  There are
      no behavior change except for the removal of two entries from one of the
      ext4 tracing interface.
      
      The nonblocking checks in ->writepages are no longer used because the
      flusher now prefer to block on get_request_wait() than to skip inodes on
      IO congestion.  The latter will lead to more seeky IO.
      
      The nonblocking checks in ->writepage are no longer used because it's
      redundant with the WB_SYNC_NONE check.
      
      We no long set ->nonblocking in VM page out and page migration, because
      a) it's effectively redundant with WB_SYNC_NONE in current code
      b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
         that would skip some dirty inodes on congestion and page out others, which
         is unfair in terms of LRU age.
      
      Inspired by Christoph Hellwig. Thanks!
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: Steve French <sfrench@samba.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b430bee
  4. 24 8月, 2010 2 次提交
  5. 10 8月, 2010 3 次提交
    • C
      xfs: new truncate sequence · fa9b227e
      Christoph Hellwig 提交于
      Convert XFS to the new truncate sequence.  We still can have errors after
      updating the file size in xfs_setattr, but these are real I/O errors and lead
      to a transaction abort and filesystem shutdown, so they are not an issue.
      
      Errors from ->write_begin and write_end can now be handled correctly because
      we can actually get rid of the delalloc extents while previous the buffer
      state was stipped in block_invalidatepage.
      
      There is still no error handling for ->direct_IO, because doing so will need
      some major restructuring given that we only have the iolock shared and do not
      hold i_mutex at all.  Fortunately leaving the normally allocated blocks behind
      there is not a major issue and this will get cleaned up by xfs_free_eofblock
      later.
      
      Note: the patch is against Al's vfs.git tree as that contains the nessecary
      preparations.  I'd prefer to get it applied there so that we can get some
      testing in linux-next.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fa9b227e
    • C
      get rid of block_write_begin_newtrunc · 155130a4
      Christoph Hellwig 提交于
      Move the call to vmtruncate to get rid of accessive blocks to the callers
      in preparation of the new truncate sequence and rename the non-truncating
      version to block_write_begin.
      
      While we're at it also remove several unused arguments to block_write_begin.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      155130a4
    • C
      sort out blockdev_direct_IO variants · eafdc7d1
      Christoph Hellwig 提交于
      Move the call to vmtruncate to get rid of accessive blocks to the callers
      in prepearation of the new truncate calling sequence.  This was only done
      for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
      was not needed anyway.  Get rid of blockdev_direct_IO_no_locking and
      its _newtrunc variant while at it as just opencoding the two additional
      paramters is shorted than the name suffix.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eafdc7d1
  6. 27 7月, 2010 15 次提交
  7. 09 6月, 2010 1 次提交
  8. 03 6月, 2010 1 次提交
    • C
      xfs: skip writeback from reclaim context · 070ecdca
      Christoph Hellwig 提交于
      Allowing writeback from reclaim context causes massive problems with stack
      overflows as we can call into the writeback code which tends to be a heavy
      stack user both in the generic code and XFS from random contexts that
      perform memory allocations.
      
      Follow the example of btrfs (and in slightly different form ext4) and refuse
      to write out data from reclaim context.  This issue should really be handled
      by the VM so that we can tune better for this case, but until we get it
      sorted out there we have to hack around this in each filesystem with a
      complex writeback path.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      
      070ecdca
  9. 19 5月, 2010 9 次提交
  10. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  11. 17 3月, 2010 1 次提交
  12. 06 3月, 2010 2 次提交
  13. 02 3月, 2010 2 次提交
    • D
      xfs: Non-blocking inode locking in IO completion · 77d7a0c2
      Dave Chinner 提交于
      The introduction of barriers to loop devices has created a new IO
      order completion dependency that XFS does not handle. The loop
      device implements barriers using fsync and so turns a log IO in the
      XFS filesystem on the loop device into a data IO in the backing
      filesystem. That is, the completion of log IOs in the loop
      filesystem are now dependent on completion of data IO in the backing
      filesystem.
      
      This can cause deadlocks when a flush daemon issues a log force with
      an inode locked because the IO completion of IO on the inode is
      blocked by the inode lock. This in turn prevents further data IO
      completion from occuring on all XFS filesystems on that CPU (due to
      the shared nature of the completion queues). This then prevents the
      log IO from completing because the log is waiting for data IO
      completion as well.
      
      The fix for this new completion order dependency issue is to make
      the IO completion inode locking non-blocking. If the inode lock
      can't be grabbed, simply requeue the IO completion back to the work
      queue so that it can be processed later. This prevents the
      completion queue from being blocked and allows data IO completion on
      other inodes to proceed, hence avoiding completion order dependent
      deadlocks.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      77d7a0c2
    • C
      xfs: implement optimized fdatasync · 66d834ea
      Christoph Hellwig 提交于
      Allow us to track the difference between timestamp and size updates
      by using mark_inode_dirty from the I/O completion code, and checking
      the VFS inode flags in xfs_file_fsync.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      66d834ea