1. 20 8月, 2011 1 次提交
    • J
      ext4: flush any pending end_io requests before DIO reads w/dioread_nolock · dccaf33f
      Jiaying Zhang 提交于
      There is a race between ext4 buffer write and direct_IO read with
      dioread_nolock mount option enabled. The problem is that we clear
      PageWriteback flag during end_io time but will do
      uninitialized-to-initialized extent conversion later with dioread_nolock.
      If an O_direct read request comes in during this period, ext4 will return
      zero instead of the recently written data.
      
      This patch checks whether there are any pending uninitialized-to-initialized
      extent conversion requests before doing O_direct read to close the race.
      Note that this is just a bandaid fix. The fundamental issue is that we
      clear PageWriteback flag before we really complete an IO, which is
      problem-prone. To fix the fundamental issue, we may need to implement an
      extent tree cache that we can use to look up pending to-be-converted extents.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      dccaf33f
  2. 14 8月, 2011 3 次提交
    • T
      ext4: fix nomblk_io_submit option so it correctly converts uninit blocks · 9dd75f1f
      Theodore Ts'o 提交于
      Bug discovered by Jan Kara:
      
      Finally, commit 1449032b returned back
      the old IO submission code but apparently it forgot to return the old
      handling of uninitialized buffers so we unconditionnaly call
      block_write_full_page() without specifying end_io function. So AFAICS
      we never convert unwritten extents to written in some cases. For
      example when I mount the fs as: mount -t ext4 -o
      nomblk_io_submit,dioread_nolock /dev/ubdb /mnt and do
              int fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0600);
              char buf[1024];
              memset(buf, 'a', sizeof(buf));
              fallocate(fd, 0, 0, 16384);
              write(fd, buf, sizeof(buf));
      
      I get a file full of zeros (after remounting the filesystem so that
      pagecache is dropped) instead of seeing the first KB contain 'a's.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      9dd75f1f
    • T
      ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN. · 32c80b32
      Tao Ma 提交于
      EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten
      should be done simultaneously since ext4_end_io_nolock always clear
      the flag and decrease the counter in the same time.
      
      We don't increase i_aiodio_unwritten when setting
      EXT4_IO_END_UNWRITTEN so it will go nagative and causes some process
      to wait forever.
      
      Part of the patch came from Eric in his e-mail, but it doesn't fix the
      problem met by Michael actually.
      
      http://marc.info/?l=linux-ext4&m=131316851417460&w=2
      
      Reported-and-Tested-by: Michael Tokarev<mjt@tls.msk.ru>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      32c80b32
    • J
      ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode · 2581fdc8
      Jiaying Zhang 提交于
      Flush inode's i_completed_io_list before calling ext4_io_wait to
      prevent the following deadlock scenario: A page fault happens while
      some process is writing inode A. During page fault,
      shrink_icache_memory is called that in turn evicts another inode
      B. Inode B has some pending io_end work so it calls ext4_ioend_wait()
      that waits for inode B's i_ioend_count to become zero. However, inode
      B's ioend work was queued behind some of inode A's ioend work on the
      same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
      thread on that cpu is processing inode A's ioend work, it tries to
      grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
      still hold before the page fault happened, we enter a deadlock.
      
      Also moves ext4_flush_completed_IO and ext4_ioend_wait from
      ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion,
      ext4_evict_inode() is called before ext4_destroy_inode() and in
      ext4_evict_inode(), we may call ext4_truncate() without holding
      i_mutex lock. As a result, there is a race between flush_completed_IO
      that is called from ext4_ext_truncate() and ext4_end_io_work, which
      may cause corruption on an io_end structure. This change moves
      ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode()
      to ext4_evict_inode() to resolve the race between ext4_truncate() and
      ext4_end_io_work during inode deletion.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      2581fdc8
  3. 13 8月, 2011 1 次提交
    • C
      ext4: Fix ext4_should_writeback_data() for no-journal mode · 441c8508
      Curt Wohlgemuth 提交于
      ext4_should_writeback_data() had an incorrect sequence of
      tests to determine if it should return 0 or 1: in
      particular, even in no-journal mode, 0 was being returned
      for a non-regular-file inode.
      
      This meant that, in non-journal mode, we would use
      ext4_journalled_aops for directories, symlinks, and other
      non-regular files.  However, calling journalled aop
      callbacks when there is no valid handle, can cause problems.
      
      This would cause a kernel crash with Jan Kara's commit
      2d859db3 ("ext4: fix data corruption in inodes with
      journalled data"), because we now dereference 'handle' in
      ext4_journalled_write_end().
      
      I also added BUG_ONs to check for a valid handle in the
      obviously journal-only aops callbacks.
      
      I tested this running xfstests with a scratch device in
      these modes:
      
         - no-journal
         - data=ordered
         - data=writeback
         - data=journal
      
      All work fine; the data=journal run has many failures and a
      crash in xfstests 074, but this is no different from a
      vanilla kernel.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      441c8508
  4. 12 8月, 2011 1 次提交
  5. 04 8月, 2011 1 次提交
  6. 02 8月, 2011 3 次提交
  7. 01 8月, 2011 5 次提交
  8. 31 7月, 2011 2 次提交
    • D
      ext4: add missing kfree() on error return path in add_new_gdb() · c49bafa3
      Dan Carpenter 提交于
      We added some more error handling in b4097142 "ext4: add error
      checking to calls to ext4_handle_dirty_metadata()".  But we need to
      call kfree() as well to avoid a memory leak.
      Signed-off-by: NDan Carpenter <error27@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c49bafa3
    • T
      ext4: fix races in ext4_sync_parent() · d59729f4
      Theodore Ts'o 提交于
      Fix problems if fsync() races against a rename of a parent directory
      as pointed out by Al Viro in his own inimitable way:
      
      >While we are at it, could somebody please explain what the hell is ext4
      >doing in
      >static int ext4_sync_parent(struct inode *inode)
      >{
      >        struct writeback_control wbc;
      >        struct dentry *dentry = NULL;
      >        int ret = 0;
      >
      >        while (inode && ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) {
      >                ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY);
      >                dentry = list_entry(inode->i_dentry.next,
      >                                    struct dentry, d_alias);
      >                if (!dentry || !dentry->d_parent || !dentry->d_parent->d_inode)
      >                        break;
      >                inode = dentry->d_parent->d_inode;
      >                ret = sync_mapping_buffers(inode->i_mapping);
      >                ...
      >Note that dentry obviously can't be NULL there.  dentry->d_parent is never
      >NULL.  And dentry->d_parent would better not be negative, for crying out
      >loud!  What's worse, there's no guarantees that dentry->d_parent will
      >remain our parent over that sync_mapping_buffers() *and* that inode won't
      >just be freed under us (after rename() and memory pressure leading to
      >eviction of what used to be our dentry->d_parent)......
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d59729f4
  9. 28 7月, 2011 5 次提交
  10. 27 7月, 2011 8 次提交
  11. 26 7月, 2011 5 次提交
  12. 24 7月, 2011 5 次提交