1. 29 9月, 2012 1 次提交
    • D
      ext4: completed_io locking cleanup · 28a535f9
      Dmitry Monakhov 提交于
      Current unwritten extent conversion state-machine is very fuzzy.
      - For unknown reason it performs conversion under i_mutex. What for?
        My diagnosis:
        We already protect extent tree with i_data_sem, truncate and punch_hole
        should wait for DIO, so the only data we have to protect is end_io->flags
        modification, but only flush_completed_IO and end_io_work modified this
        flags and we can serialize them via i_completed_io_lock.
      
        Currently all these games with mutex_trylock result in the following deadlock
         truncate:                          kworker:
          ext4_setattr                       ext4_end_io_work
          mutex_lock(i_mutex)
          inode_dio_wait(inode)  ->BLOCK
                                   DEADLOCK<- mutex_trylock()
                                              inode_dio_done()
        #TEST_CASE1_BEGIN
        MNT=/mnt_scrach
        unlink $MNT/file
        fallocate -l $((1024*1024*1024)) $MNT/file
        aio-stress -I 100000 -O -s 100m -n -t 1 -c 10 -o 2 -o 3 $MNT/file
        sleep 2
        truncate -s 0 $MNT/file
        #TEST_CASE1_END
      
      Or use 286's xfstests https://github.com/dmonakhov/xfstests/blob/devel/286
      
      This patch makes state machine simple and clean:
      
      (1) xxx_end_io schedule final extent conversion simply by calling
          ext4_add_complete_io(), which append it to ei->i_completed_io_list
          NOTE1: because of (2A) work should be queued only if
          ->i_completed_io_list was empty, otherwise the work is scheduled already.
      
      (2) ext4_flush_completed_IO is responsible for handling all pending
          end_io from ei->i_completed_io_list
          Flushing sequence consists of following stages:
          A) LOCKED: Atomically drain completed_io_list to local_list
          B) Perform extents conversion
          C) LOCKED: move converted io's to to_free list for final deletion
             	     This logic depends on context which we was called from.
          D) Final end_io context destruction
          NOTE1: i_mutex is no longer required because end_io->flags modification
          is protected by ei->ext4_complete_io_lock
      
      Full list of changes:
      - Move all completion end_io related routines to page-io.c in order to improve
        logic locality
      - Move open coded logic from various xx_end_xx routines to ext4_add_complete_io()
      - remove EXT4_IO_END_FSYNC
      - Improve SMP scalability by removing useless i_mutex which does not
        protect io->flags anymore.
      - Reduce lock contention on i_completed_io_lock by optimizing list walk.
      - Rename ext4_end_io_nolock to end4_end_io and make it static
      - Check flush completion status to ext4_ext_punch_hole(). Because it is
        not good idea to punch blocks from corrupted inode.
      
      Changes since V3 (in request to Jan's comments):
        Fall back to active flush_completed_IO() approach in order to prevent
        performance issues with nolocked DIO reads.
      Changes since V2:
        Fix use-after-free caused by race truncate vs end_io_work
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      28a535f9
  2. 09 1月, 2012 1 次提交
  3. 10 9月, 2011 1 次提交
  4. 07 9月, 2011 1 次提交
    • A
      ext4: fix fsx truncate failure · 189e868f
      Allison Henderson 提交于
      While running extended fsx tests to verify the first
      two patches, a similar bug was also found in the
      truncate operation.
      
      This bug happens because the truncate routine only zeros
      the unblock aligned portion of the last page.  This means
      that the block aligned portions of the page appearing after
      i_size are left unzeroed, and the buffer heads still mapped.
      
      This bug is corrected by using ext4_discard_partial_page_buffers
      in the truncate routine to zero the partial page and unmap
      the buffer headers.
      Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      189e868f
  5. 20 8月, 2011 1 次提交
    • J
      ext4: flush any pending end_io requests before DIO reads w/dioread_nolock · dccaf33f
      Jiaying Zhang 提交于
      There is a race between ext4 buffer write and direct_IO read with
      dioread_nolock mount option enabled. The problem is that we clear
      PageWriteback flag during end_io time but will do
      uninitialized-to-initialized extent conversion later with dioread_nolock.
      If an O_direct read request comes in during this period, ext4 will return
      zero instead of the recently written data.
      
      This patch checks whether there are any pending uninitialized-to-initialized
      extent conversion requests before doing O_direct read to close the race.
      Note that this is just a bandaid fix. The fundamental issue is that we
      clear PageWriteback flag before we really complete an IO, which is
      problem-prone. To fix the fundamental issue, we may need to implement an
      extent tree cache that we can use to look up pending to-be-converted extents.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      dccaf33f
  6. 28 6月, 2011 2 次提交