1. 28 1月, 2013 4 次提交
  2. 29 11月, 2012 1 次提交
    • T
      ext4: rationalize ext4_extents.h inclusion · 4a092d73
      Theodore Ts'o 提交于
      Previously, ext4_extents.h was being included at the end of ext4.h,
      which was bad for a number of reasons: (a) it was not being included
      in the expected place, and (b) it caused the header to be included
      multiple times.  There were #ifdef's to prevent this from causing any
      problems, but it still was unnecessary.
      
      By moving the function declarations that were in ext4_extents.h to
      ext4.h, which is standard practice for where the function declarations
      for the rest of ext4.h can be found, we can remove ext4_extents.h from
      being included in ext4.h at all, and then we can only include
      ext4_extents.h where it is needed in ext4's source files.
      
      It should be possible to move a few more things into ext4.h, and
      further reduce the number of source files that need to #include
      ext4_extents.h, but that's a cleanup for another day.
      Reported-by: NSachin Kamat <sachin.kamat@linaro.org>
      Reported-by: NWei Yongjun <weiyj.lk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4a092d73
  3. 09 11月, 2012 1 次提交
  4. 05 10月, 2012 1 次提交
    • D
      ext4: fix ext4_flush_completed_IO wait semantics · c278531d
      Dmitry Monakhov 提交于
      BUG #1) All places where we call ext4_flush_completed_IO are broken
          because buffered io and DIO/AIO goes through three stages
          1) submitted io,
          2) completed io (in i_completed_io_list) conversion pended
          3) finished  io (conversion done)
          And by calling ext4_flush_completed_IO we will flush only
          requests which were in (2) stage, which is wrong because:
           1) punch_hole and truncate _must_ wait for all outstanding unwritten io
            regardless to it's state.
           2) fsync and nolock_dio_read should also wait because there is
              a time window between end_page_writeback() and ext4_add_complete_io()
              As result integrity fsync is broken in case of buffered write
              to fallocated region:
              fsync                                      blkdev_completion
      	 ->filemap_write_and_wait_range
                                                         ->ext4_end_bio
                                                           ->end_page_writeback
                <-- filemap_write_and_wait_range return
      	 ->ext4_flush_completed_IO
         	 sees empty i_completed_io_list but pended
         	 conversion still exist
                                                           ->ext4_add_complete_io
      
      BUG #2) Race window becomes wider due to the 'ext4: completed_io
      locking cleanup V4' patch series
      
      This patch make following changes:
      1) ext4_flush_completed_io() now first try to flush completed io and when
         wait for any outstanding unwritten io via ext4_unwritten_wait()
      2) Rename function to more appropriate name.
      3) Assert that all callers of ext4_flush_unwritten_io should hold i_mutex to
         prevent endless wait
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      c278531d
  5. 29 9月, 2012 3 次提交
    • D
      ext4: completed_io locking cleanup · 28a535f9
      Dmitry Monakhov 提交于
      Current unwritten extent conversion state-machine is very fuzzy.
      - For unknown reason it performs conversion under i_mutex. What for?
        My diagnosis:
        We already protect extent tree with i_data_sem, truncate and punch_hole
        should wait for DIO, so the only data we have to protect is end_io->flags
        modification, but only flush_completed_IO and end_io_work modified this
        flags and we can serialize them via i_completed_io_lock.
      
        Currently all these games with mutex_trylock result in the following deadlock
         truncate:                          kworker:
          ext4_setattr                       ext4_end_io_work
          mutex_lock(i_mutex)
          inode_dio_wait(inode)  ->BLOCK
                                   DEADLOCK<- mutex_trylock()
                                              inode_dio_done()
        #TEST_CASE1_BEGIN
        MNT=/mnt_scrach
        unlink $MNT/file
        fallocate -l $((1024*1024*1024)) $MNT/file
        aio-stress -I 100000 -O -s 100m -n -t 1 -c 10 -o 2 -o 3 $MNT/file
        sleep 2
        truncate -s 0 $MNT/file
        #TEST_CASE1_END
      
      Or use 286's xfstests https://github.com/dmonakhov/xfstests/blob/devel/286
      
      This patch makes state machine simple and clean:
      
      (1) xxx_end_io schedule final extent conversion simply by calling
          ext4_add_complete_io(), which append it to ei->i_completed_io_list
          NOTE1: because of (2A) work should be queued only if
          ->i_completed_io_list was empty, otherwise the work is scheduled already.
      
      (2) ext4_flush_completed_IO is responsible for handling all pending
          end_io from ei->i_completed_io_list
          Flushing sequence consists of following stages:
          A) LOCKED: Atomically drain completed_io_list to local_list
          B) Perform extents conversion
          C) LOCKED: move converted io's to to_free list for final deletion
             	     This logic depends on context which we was called from.
          D) Final end_io context destruction
          NOTE1: i_mutex is no longer required because end_io->flags modification
          is protected by ei->ext4_complete_io_lock
      
      Full list of changes:
      - Move all completion end_io related routines to page-io.c in order to improve
        logic locality
      - Move open coded logic from various xx_end_xx routines to ext4_add_complete_io()
      - remove EXT4_IO_END_FSYNC
      - Improve SMP scalability by removing useless i_mutex which does not
        protect io->flags anymore.
      - Reduce lock contention on i_completed_io_lock by optimizing list walk.
      - Rename ext4_end_io_nolock to end4_end_io and make it static
      - Check flush completion status to ext4_ext_punch_hole(). Because it is
        not good idea to punch blocks from corrupted inode.
      
      Changes since V3 (in request to Jan's comments):
        Fall back to active flush_completed_IO() approach in order to prevent
        performance issues with nolocked DIO reads.
      Changes since V2:
        Fix use-after-free caused by race truncate vs end_io_work
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      28a535f9
    • D
      ext4: fix unwritten counter leakage · 82e54229
      Dmitry Monakhov 提交于
      ext4_set_io_unwritten_flag() will increment i_unwritten counter, so
      once we mark end_io with EXT4_END_IO_UNWRITTEN we have to revert it back
      on error path.
      
       - add missed error checks to prevent counter leakage
       - ext4_end_io_nolock() will clear EXT4_END_IO_UNWRITTEN flag to signal
         that conversion finished.
       - add BUG_ON to ext4_free_end_io() to prevent similar leakage in future.
      
      Visible effect of this bug is that unaligned aio_stress may deadlock
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      82e54229
    • D
      ext4: give i_aiodio_unwritten a more appropriate name · e27f41e1
      Dmitry Monakhov 提交于
      AIO/DIO prefix is wrong because it account unwritten extents which
      also may be scheduled from buffered write endio
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e27f41e1
  6. 30 3月, 2012 1 次提交
    • L
      Revert "ext4: don't release page refs in ext4_end_bio()" · 6268b325
      Linus Torvalds 提交于
      This reverts commit b43d17f3.
      
      Dave Jones reports that it causes lockups on his laptop, and his debug
      output showed a lot of processes hung waiting for page_writeback (or
      more commonly - processes hung waiting for a lock that was held during
      that writeback wait).
      
      The page_writeback hint made Ted suggest that Dave look at this commit,
      and Dave verified that reverting it makes his problems go away.
      
      Ted says:
       "That commit fixes a race which is seen when you write into fallocated
        (and hence uninitialized) disk blocks under *very* heavy memory
        pressure.  Furthermore, although theoretically it could trigger under
        normal direct I/O writes, it only seems to trigger if you are issuing
        a huge number of AIO writes, such that a just-written page can get
        evicted from memory, and then read back into memory, before the
        workqueue has a chance to update the extent tree.
      
        This race has been around for a little over a year, and no one noticed
        until two months ago; it only happens under fairly exotic conditions,
        and in fact even after trying very hard to create a simple repro under
        lab conditions, we could only reproduce the problem and confirm the
        fix on production servers running MySQL on very fast PCIe-attached
        flash devices.
      
        Given that Dave was able to hit this problem pretty quickly, if we
        confirm that this commit is at fault, the only reasonable thing to do
        is to revert it IMO."
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Acked-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6268b325
  7. 05 3月, 2012 2 次提交
    • C
      ext4: don't release page refs in ext4_end_bio() · b43d17f3
      Curt Wohlgemuth 提交于
      We can clear PageWriteback on each page when the IO
      completes, but we can't release the references on the page
      until we convert any uninitialized extents.
      
      Without this patch, the use of the dioread_nolock mount
      option can break buffered writes, because extents may
      not be converted by the time a subsequent buffered read
      comes in; if the page is not in the page cache, a read
      will return zeros if the extent is still uninitialized.
      
      I tested this with a (temporary) patch that adds a call
      to msleep(1000) at the start of ext4_end_io_work(), to delay
      processing of each DIO-unwritten work queue item.  With this
      msleep(), a simple workload of
      
        fallocate
        write
        fadvise
        read
      
      will fail without this patch, succeeds with it.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b43d17f3
    • J
      ext4: fix race between sync and completed io work · 491caa43
      Jeff Moyer 提交于
      The following command line will leave the aio-stress process unkillable
      on an ext4 file system (in my case, mounted on /mnt/test):
      
      aio-stress -t 20 -s 10 -O -S -o 2 -I 1000 /mnt/test/aiostress.3561.4 /mnt/test/aiostress.3561.4.20 /mnt/test/aiostress.3561.4.19 /mnt/test/aiostress.3561.4.18 /mnt/test/aiostress.3561.4.17 /mnt/test/aiostress.3561.4.16 /mnt/test/aiostress.3561.4.15 /mnt/test/aiostress.3561.4.14 /mnt/test/aiostress.3561.4.13 /mnt/test/aiostress.3561.4.12 /mnt/test/aiostress.3561.4.11 /mnt/test/aiostress.3561.4.10 /mnt/test/aiostress.3561.4.9 /mnt/test/aiostress.3561.4.8 /mnt/test/aiostress.3561.4.7 /mnt/test/aiostress.3561.4.6 /mnt/test/aiostress.3561.4.5 /mnt/test/aiostress.3561.4.4 /mnt/test/aiostress.3561.4.3 /mnt/test/aiostress.3561.4.2
      
      This is using the aio-stress program from the xfstests test suite.
      That particular command line tells aio-stress to do random writes to
      20 files from 20 threads (one thread per file).  The files are NOT
      preallocated, so you will get writes to random offsets within the
      file, thus creating holes and extending i_size.  It also opens the
      file with O_DIRECT and O_SYNC.
      
      On to the problem.  When an I/O requires unwritten extent conversion,
      it is queued onto the completed_io_list for the ext4 inode.  Two code
      paths will pull work items from this list.  The first is the
      ext4_end_io_work routine, and the second is ext4_flush_completed_IO,
      which is called via the fsync path (and O_SYNC handling, as well).
      There are two issues I've found in these code paths.  First, if the
      fsync path beats the work routine to a particular I/O, the work
      routine will free the io_end structure!  It does not take into account
      the fact that the io_end may still be in use by the fsync path.  I've
      fixed this issue by adding yet another IO_END flag, indicating that
      the io_end is being processed by the fsync path.
      
      The second problem is that the work routine will make an assignment to
      io->flag outside of the lock.  I have witnessed this result in a hang
      at umount.  Moving the flag setting inside the lock resolved that
      problem.
      
      The problem was introduced by commit b82e384c ("ext4: optimize
      locking for end_io extent conversion"), which first appeared in 3.2.
      As such, the fix should be backported to that release (probably along
      with the unwritten extent conversion race fix).
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      CC: stable@kernel.org
      491caa43
  8. 21 2月, 2012 1 次提交
    • J
      ext4: fix race between unwritten extent conversion and truncate · 266991b1
      Jeff Moyer 提交于
      The following comment in ext4_end_io_dio caught my attention:
      
      	/* XXX: probably should move into the real I/O completion handler */
              inode_dio_done(inode);
      
      The truncate code takes i_mutex, then calls inode_dio_wait.  Because the
      ext4 code path above will end up dropping the mutex before it is
      reacquired by the worker thread that does the extent conversion, it
      seems to me that the truncate can happen out of order.  Jan Kara
      mentioned that this might result in error messages in the system logs,
      but that should be the extent of the "damage."
      
      The fix is pretty straight-forward: don't call inode_dio_done until the
      extent conversion is complete.
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      266991b1
  9. 09 1月, 2012 1 次提交
  10. 14 12月, 2011 1 次提交
  11. 01 11月, 2011 1 次提交
  12. 31 10月, 2011 3 次提交
    • T
      ext4: optimize locking for end_io extent conversion · b82e384c
      Theodore Ts'o 提交于
      Now that we are doing the locking correctly, we need to grab the
      i_completed_io_lock() twice per end_io.  We can clean this up by
      removing the structure from the i_complted_io_list, and use this as
      the locking mechanism to prevent ext4_flush_completed_IO() racing
      against ext4_end_io_work(), instead of clearing the
      EXT4_IO_END_UNWRITTEN in io->flag.
      
      In addition, if the ext4_convert_unwritten_extents() returns an error,
      we no longer keep the end_io structure on the linked list.  This
      doesn't help, because it tends to lock up the file system and wedges
      the system.  That's one way to call attention to the problem, but it
      doesn't help the overall robustness of the system.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b82e384c
    • T
      ext4: remove unnecessary call to waitqueue_active() · 4e298021
      Theodore Ts'o 提交于
      The usage of waitqueue_active() is not necessary, and introduces (I
      believe) a hard-to-hit race.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4e298021
    • T
      ext4: Use correct locking for ext4_end_io_nolock() · d73d5046
      Tao Ma 提交于
      We must hold i_completed_io_lock when manipulating anything on the
      i_completed_io_list linked list.  This includes io->lock, which we
      were checking in ext4_end_io_nolock().
      
      So move this check to ext4_end_io_work().  This also has the bonus of
      avoiding extra work if it is already done without needing to take the
      mutex.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d73d5046
  13. 31 8月, 2011 1 次提交
    • J
      ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining · 8c0bec21
      Jiaying Zhang 提交于
      The i_mutex lock and flush_completed_IO() added by commit 2581fdc8
      in ext4_evict_inode() causes lockdep complaining about potential
      deadlock in several places.  In most/all of these LOCKDEP complaints
      it looks like it's a false positive, since many of the potential
      circular locking cases can't take place by the time the
      ext4_evict_inode() is called; but since at the very least it may mask
      real problems, we need to address this.
      
      This change removes the flush_completed_IO() and i_mutex lock in
      ext4_evict_inode().  Instead, we take a different approach to resolve
      the software lockup that commit 2581fdc8 intends to fix.  Rather
      than having ext4-dio-unwritten thread wait for grabing the i_mutex
      lock of an inode, we use mutex_trylock() instead, and simply requeue
      the work item if we fail to grab the inode's i_mutex lock.
      
      This should speed up work queue processing in general and also
      prevents the following deadlock scenario: During page fault,
      shrink_icache_memory is called that in turn evicts another inode B.
      Inode B has some pending io_end work so it calls ext4_ioend_wait()
      that waits for inode B's i_ioend_count to become zero.  However, inode
      B's ioend work was queued behind some of inode A's ioend work on the
      same cpu's ext4-dio-unwritten workqueue.  As the ext4-dio-unwritten
      thread on that cpu is processing inode A's ioend work, it tries to
      grab inode A's i_mutex lock.  Since the i_mutex lock of inode A is
      still hold before the page fault happened, we enter a deadlock.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8c0bec21
  14. 14 8月, 2011 1 次提交
  15. 30 6月, 2011 1 次提交
  16. 01 5月, 2011 1 次提交
    • C
      ext4: don't set PageUptodate in ext4_end_bio() · 39db00f1
      Curt Wohlgemuth 提交于
      In the bio completion routine, we should not be setting
      PageUptodate at all -- it's set at sys_write() time, and is
      unaffected by success/failure of the write to disk.
      
      This can cause a page corruption bug when the file system's
      block size is less than the architecture's VM page size.
      
      if we have only written a single block -- we might end up
      setting the page's PageUptodate flag, indicating that page
      is completely read into memory, which may not be true.
      This could cause subsequent reads to get bad data.
      
      This commit also takes the opportunity to clean up error
      handling in ext4_end_bio(), and remove some extraneous code:
      
         - fixes ext4_end_bio() to set AS_EIO in the
           page->mapping->flags on error, which was left out by
           mistake.  This is needed so that fsync() will
           return an error if there was an I/O error.
         - remove the clear_buffer_dirty() call on unmapped
           buffers for each page.
         - consolidate page/buffer error handling in a single
           section.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NJim Meyering <jim@meyering.net>
      Reported-by: NHugh Dickins <hughd@google.com>
      Cc: Mingming Cao <cmm@us.ibm.com>
      39db00f1
  17. 10 3月, 2011 1 次提交
    • J
      block: kill off REQ_UNPLUG · 721a9602
      Jens Axboe 提交于
      With the plugging now being explicitly controlled by the
      submitter, callers need not pass down unplugging hints
      to the block layer. If they want to unplug, it's because they
      manually plugged on their own - in which case, they should just
      unplug at will.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      721a9602
  18. 01 3月, 2011 1 次提交
    • T
      ext4: optimize ext4_bio_write_page() when no extent conversion is needed · b6168443
      Theodore Ts'o 提交于
      If no extent conversion is required, wake up any processes waiting for
      the page's writeback to be complete and free the ext4_io_end structure
      directly in ext4_end_bio() instead of dropping it on the linked list
      (which requires taking a spinlock to queue and dequeue the io_end
      structure), and waiting for the workqueue to do this work.
      
      This removes an extra scheduling delay before process waiting for an
      fsync() to complete gets woken up, and it also reduces the CPU
      overhead for a random write workload.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b6168443
  19. 28 2月, 2011 1 次提交
    • T
      ext4: don't leave PageWriteback set after memory failure · a54aa761
      Theodore Ts'o 提交于
      In ext4_bio_write_page(), if the memory allocation for the struct
      ext4_io_page fails, it returns with the page's PageWriteback flag set.
      This will end up causing the page not to skip writeback in
      WB_SYNC_NONE mode, and in WB_SYNC_ALL mode (i.e., on a sync, fsync, or
      umount) the writeback daemon will get stuck forever on the
      wait_on_page_writeback() function in write_cache_pages_da().
      
      Or, if journalling is enabled and the file gets deleted, it the
      journal thread can get stuck in journal_finish_inode_data_buffers()
      call to filemap_fdatawait().
      
      Another place where things can get hung up is in
      truncate_inode_pages(), called out of ext4_evict_inode().
      
      Fix this by not setting PageWriteback until after we have successfully
      allocated the struct ext4_io_page.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a54aa761
  20. 22 2月, 2011 1 次提交
  21. 12 2月, 2011 1 次提交
    • E
      ext4: serialize unaligned asynchronous DIO · e9e3bcec
      Eric Sandeen 提交于
      ext4 has a data corruption case when doing non-block-aligned
      asynchronous direct IO into a sparse file, as demonstrated
      by xfstest 240.
      
      The root cause is that while ext4 preallocates space in the
      hole, mappings of that space still look "new" and 
      dio_zero_block() will zero out the unwritten portions.  When
      more than one AIO thread is going, they both find this "new"
      block and race to zero out their portion; this is uncoordinated
      and causes data corruption.
      
      Dave Chinner fixed this for xfs by simply serializing all
      unaligned asynchronous direct IO.  I've done the same here.
      The difference is that we only wait on conversions, not all IO.
      This is a very big hammer, and I'm not very pleased with
      stuffing this into ext4_file_write().  But since ext4 is
      DIO_LOCKING, we need to serialize it at this high level.
      
      I tried to move this into ext4_ext_direct_IO, but by then
      we have the i_mutex already, and we will wait on the
      work queue to do conversions - which must also take the
      i_mutex.  So that won't work.
      
      This was originally exposed by qemu-kvm installing to
      a raw disk image with a normal sector-63 alignment.  I've
      tested a backport of this patch with qemu, and it does
      avoid the corruption.  It is also quite a lot slower
      (14 min for package installs, vs. 8 min for well-aligned)
      but I'll take slow correctness over fast corruption any day.
      
      Mingming suggested that we can track outstanding
      conversions, and wait on those so that non-sparse
      files won't be affected, and I've implemented that here;
      unaligned AIO to nonsparse files won't take a perf hit.
      
      [tytso@mit.edu: Keep the mutex as a hashed array instead
       of bloating the ext4 inode]
      
      [tytso@mit.edu: Fix up namespace issues so that global
       variables are protected with an "ext4_" prefix.]
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e9e3bcec
  22. 08 2月, 2011 1 次提交
    • C
      ext4: Fix data corruption with multi-block writepages support · d50bdd5a
      Curt Wohlgemuth 提交于
      This fixes a corruption problem with the multi-block
      writepages submittal change for ext4, from commit
      bd2d0210 ("ext4: use bio
      layer instead of buffer layer in mpage_da_submit_io").
      
      (Note that this corruption is not present in 2.6.37 on
      ext4, because the corruption was detected after the
      feature was merged in 2.6.37-rc1, and so it was turned
      off by adding a non-default mount option,
      mblk_io_submit.  With this commit, which hopefully
      fixes the last of the bugs with this feature, we'll be
      able to turn on this performance feature by default in
      2.6.38, and remove the mblk_io_submit option.)
      
      The ext4 code path to bundle multiple pages for
      writeback in ext4_bio_write_page() had a bug: we should
      be clearing buffer head dirty flags *before* we submit
      the bio, not in the completion routine.
      
      The patch below was tested on 2.6.37 under KVM with the
      postgresql script which was submitted by Jon Nelson as
      documented in commit 1449032b.
      
      Without the patch, I'd hit the corruption problem about
      50-70% of the time.  With the patch, I executed the
      script > 100 times with no corruption seen.
      
      I also fixed a bug to make sure ext4_end_bio() doesn't
      dereference the bio after the bio_put() call.
      Reported-by: NJon Nelson <jnelson@jamponi.net>
      Reported-by: NMatthias Bayer <jackdachef@gmail.com>
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      d50bdd5a
  23. 11 1月, 2011 1 次提交
  24. 20 12月, 2010 1 次提交
  25. 18 11月, 2010 1 次提交
  26. 09 11月, 2010 2 次提交
    • T
      ext4: fix potential race when freeing ext4_io_page structures · 83668e71
      Theodore Ts'o 提交于
      Use an atomic_t and make sure we don't free the structure while we
      might still be submitting I/O for that page.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      83668e71
    • T
      ext4: handle writeback of inodes which are being freed · f7ad6d2e
      Theodore Ts'o 提交于
      The following BUG can occur when an inode which is getting freed when
      it still has dirty pages outstanding, and it gets deleted (in this
      because it was the target of a rename).  In ordered mode, we need to
      make sure the data pages are written just in case we crash before the
      rename (or unlink) is committed.  If the inode is being freed then
      when we try to igrab the inode, we end up tripping the BUG_ON at
      fs/ext4/page-io.c:146.
      
      To solve this problem, we need to keep track of the number of io
      callbacks which are pending, and avoid destroying the inode until they
      have all been completed.  That way we don't have to bump the inode
      count to keep the inode from being destroyed; an approach which
      doesn't work because the count could have already been dropped down to
      zero before the inode writeback has started (at which point we're not
      allowed to bump the count back up to 1, since it's already started
      getting freed).
      
      Thanks to Dave Chinner for suggesting this approach, which is also
      used by XFS.
      
        kernel BUG at /scratch_space/linux-2.6/fs/ext4/page-io.c:146!
        Call Trace:
         [<ffffffff811075b1>] ext4_bio_write_page+0x172/0x307
         [<ffffffff811033a7>] mpage_da_submit_io+0x2f9/0x37b
         [<ffffffff811068d7>] mpage_da_map_and_submit+0x2cc/0x2e2
         [<ffffffff811069b3>] mpage_add_bh_to_extent+0xc6/0xd5
         [<ffffffff81106c66>] write_cache_pages_da+0x2a4/0x3ac
         [<ffffffff81107044>] ext4_da_writepages+0x2d6/0x44d
         [<ffffffff81087910>] do_writepages+0x1c/0x25
         [<ffffffff810810a4>] __filemap_fdatawrite_range+0x4b/0x4d
         [<ffffffff810815f5>] filemap_fdatawrite_range+0xe/0x10
         [<ffffffff81122a2e>] jbd2_journal_begin_ordered_truncate+0x7b/0xa2
         [<ffffffff8110615d>] ext4_evict_inode+0x57/0x24c
         [<ffffffff810c14a3>] evict+0x22/0x92
         [<ffffffff810c1a3d>] iput+0x212/0x249
         [<ffffffff810bdf16>] dentry_iput+0xa1/0xb9
         [<ffffffff810bdf6b>] d_kill+0x3d/0x5d
         [<ffffffff810be613>] dput+0x13a/0x147
         [<ffffffff810b990d>] sys_renameat+0x1b5/0x258
         [<ffffffff81145f71>] ? _atomic_dec_and_lock+0x2d/0x4c
         [<ffffffff810b2950>] ? cp_new_stat+0xde/0xea
         [<ffffffff810b29c1>] ? sys_newlstat+0x2d/0x38
         [<ffffffff810b99c6>] sys_rename+0x16/0x18
         [<ffffffff81002a2b>] system_call_fastpath+0x16/0x1b
      Reported-by: NNick Bowler <nbowler@elliptictech.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Tested-by: NNick Bowler <nbowler@elliptictech.com>
      f7ad6d2e
  27. 28 10月, 2010 2 次提交