1. 11 1月, 2011 1 次提交
    • J
      ext4: flush the i_completed_io_list during ext4_truncate · 3889fd57
      Jiaying Zhang 提交于
      Ted first found the bug when running 2.6.36 kernel with dioread_nolock
      mount option that xfstests #13 complained about wrong file size during fsck.
      However, the bug exists in the older kernels as well although it is
      somehow harder to trigger.
      
      The problem is that ext4_end_io_work() can happen after we have truncated an
      inode to a smaller size. Then when ext4_end_io_work() calls 
      ext4_convert_unwritten_extents(), we may reallocate some blocks that have 
      been truncated, so the inode size becomes inconsistent with the allocated
      blocks. 
      
      The following patch flushes the i_completed_io_list during truncate to reduce 
      the risk that some pending end_io requests are executed later and convert 
      already truncated blocks to initialized. 
      
      Note that although the fix helps reduce the problem a lot there may still 
      be a race window between vmtruncate() and ext4_end_io_work(). The fundamental
      problem is that if vmtruncate() is called without either i_mutex or i_alloc_sem
      held, it can race with an ongoing write request so that the io_end request is
      processed later when the corresponding blocks have been truncated.
      
      Ted and I have discussed the problem offline and we saw a few ways to fix
      the race completely:
      
      a) We guarantee that i_mutex lock and i_alloc_sem write lock are both hold 
      whenever vmtruncate() is called. The i_mutex lock prevents any new write
      requests from entering writeback and the i_alloc_sem prevents the race
      from ext4_page_mkwrite(). Currently we hold both locks if vmtruncate()
      is called from do_truncate(), which is probably the most common case.
      However, there are places where we may call vmtruncate() without holding
      either i_mutex or i_alloc_sem. I would like to ask for other people's
      opinions on what locks are expected to be held before calling vmtruncate().
      There seems a disagreement among the callers of that function.
      
      b) We change the ext4 write path so that we change the extent tree to contain 
      the newly allocated blocks and update i_size both at the same time --- when 
      the write of the data blocks is completed.
      
      c) We add some additional locking to synchronize vmtruncate() and 
      ext4_end_io_work(). This approach may have performance implications so we
      need to be careful.
      
      All of the above proposals may require more substantial changes, so
      we may consider to take the following patch as a bandaid.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      3889fd57
  2. 28 10月, 2010 1 次提交
  3. 17 9月, 2010 1 次提交
    • C
      block: remove BLKDEV_IFL_WAIT · dd3932ed
      Christoph Hellwig 提交于
      All the blkdev_issue_* helpers can only sanely be used for synchronous
      caller.  To issue cache flushes or barriers asynchronously the caller needs
      to set up a bio by itself with a completion callback to move the asynchronous
      state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
      specified when calling blkdev_issue_* and also remove the now unused flags
      argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
      blkdev_issue_discard we need to keep it for the secure discard flag, which
      gains a more descriptive name and loses the bitops vs flag confusion.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      dd3932ed
  4. 28 5月, 2010 2 次提交
  5. 17 5月, 2010 2 次提交
  6. 10 5月, 2010 1 次提交
  7. 29 4月, 2010 1 次提交
  8. 03 3月, 2010 1 次提交
  9. 23 12月, 2009 1 次提交
    • T
      ext4, jbd2: Add barriers for file systems with exernal journals · cc3e1bea
      Theodore Ts'o 提交于
      This is a bit complicated because we are trying to optimize when we
      send barriers to the fs data disk.  We could just throw in an extra
      barrier to the data disk whenever we send a barrier to the journal
      disk, but that's not always strictly necessary.
      
      We only need to send a barrier during a commit when there are data
      blocks which are must be written out due to an inode written in
      ordered mode, or if fsync() depends on the commit to force data blocks
      to disk.  Finally, before we drop transactions from the beginning of
      the journal during a checkpoint operation, we need to guarantee that
      any blocks that were flushed out to the data disk are firmly on the
      rust platter before we drop the transaction from the journal.
      
      Thanks to Oleg Drokin for pointing out this flaw in ext3/ext4.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      cc3e1bea
  10. 09 12月, 2009 1 次提交
  11. 23 11月, 2009 1 次提交
  12. 29 9月, 2009 1 次提交
    • M
      ext4: async direct IO for holes and fallocate support · 8d5d02e6
      Mingming Cao 提交于
      For async direct IO that covers holes or fallocate, the end_io
      callback function now queued the convertion work on workqueue but
      don't flush the work rightaway as it might take too long to afford.
      
      But when fsync is called after all the data is completed, user expects
      the metadata also being updated before fsync returns.
      
      Thus we need to flush the conversion work when fsync() is called.
      This patch keep track of a listed of completed async direct io that
      has a work queued on workqueue.  When fsync() is called, it will go
      through the list and do the conversion.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      8d5d02e6
  13. 13 9月, 2009 1 次提交
  14. 06 9月, 2009 1 次提交
  15. 17 6月, 2009 1 次提交
  16. 06 10月, 2008 1 次提交
  17. 09 9月, 2008 1 次提交
  18. 12 7月, 2008 1 次提交
  19. 30 4月, 2008 1 次提交
  20. 17 4月, 2008 1 次提交
    • H
      ext4: fdatasync should skip metadata writeout when overwriting · 53c550e9
      Hisashi Hifumi 提交于
      Currently fdatasync is identical to fsync in ext3.
      
      I think fdatasync should skip journal flush in data=ordered and
      data=writeback mode when it overwrites to already-instantiated blocks on
      HDD.  When I_DIRTY_DATASYNC flag is not set, fdatasync should skip journal
      writeout because this indicates only atime or/and mtime updates.
      
      Following patch is the same approach of ext2's fsync code(ext2_sync_file).
      
      I did a performance test using the sysbench.
      
      #sysbench --num-threads=128 --max-requests=50000 --test=fileio --file-total-size=128G
      --file-test-mode=rndwr --file-fsync-mode=fdatasync run
      
      The result on ext3 was:
      
      	-2.6.24
      	Operations performed:  0 Read, 50080 Write, 59600 Other = 109680 Total
      	Read 0b  Written 782.5Mb  Total transferred 782.5Mb  (12.116Mb/sec)
      	  775.45 Requests/sec executed
      
      	Test execution summary:
      	    total time:                          64.5814s
      	    total number of events:              50080
      	    total time taken by event execution: 3713.9836
      	    per-request statistics:
      	         min:                            0.0000s
      	         avg:                            0.0742s
      	         max:                            0.9375s
      	         approx.  95 percentile:         0.2901s
      
      	Threads fairness:
      	    events (avg/stddev):           391.2500/23.26
      	    execution time (avg/stddev):   29.0155/1.99
      
      	-2.6.24-patched
      	Operations performed:  0 Read, 50009 Write, 61596 Other = 111605 Total
      	Read 0b  Written 781.39Mb  Total transferred 781.39Mb  (16.419Mb/sec)
      	1050.83 Requests/sec executed
      
      	Test execution summary:
      	    total time:                          47.5900s
      	    total number of events:              50009
      	    total time taken by event execution: 2934.5768
      	    per-request statistics:
       	         min:                            0.0000s
      	         avg:                            0.0587s
       	         max:                            0.8938s
      	         approx.  95 percentile:         0.1993s
      
      	Threads fairness:
      	    events (avg/stddev):           390.6953/22.64
      	    execution time (avg/stddev):   22.9264/1.17
      
      Filesystem I/O throughput was improved.
      
      Signed-off-by :Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      53c550e9
  21. 18 10月, 2007 1 次提交
  22. 12 10月, 2006 3 次提交
  23. 27 9月, 2006 1 次提交
  24. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4