1. 23 7月, 2012 2 次提交
  2. 10 7月, 2012 1 次提交
    • Z
      ext4: add a new nolock flag in ext4_map_blocks · 729f52c6
      Zheng Liu 提交于
      EXT4_GET_BLOCKS_NO_LOCK flag is added to indicate that we don't need
      to acquire i_data_sem lock in ext4_map_blocks.  Meanwhile, it changes
      ext4_get_block() to not start a new journal because when we do a
      overwrite dio, there is no any metadata that needs to be modified.
      
      We define a new function called ext4_get_block_write_nolock, which is
      used in dio overwrite nolock.  In this function, it doesn't try to
      acquire i_data_sem lock and doesn't start a new journal as it does a
      lookup.
      
      CC: Tao Ma <tm@tao.ma>
      CC: Eric Sandeen <sandeen@redhat.com>
      CC: Robin Dong <hao.bigrat@gmail.com>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      729f52c6
  3. 01 6月, 2012 1 次提交
  4. 16 5月, 2012 1 次提交
  5. 30 4月, 2012 3 次提交
  6. 22 3月, 2012 3 次提交
  7. 20 3月, 2012 3 次提交
  8. 05 3月, 2012 1 次提交
    • J
      ext4: clean up the flags passed to __blockdev_direct_IO · 93ef8541
      Jeff Moyer 提交于
      For extent-based files, you can perform DIO to holes, as mentioned in
      the comments in ext4_ext_direct_IO.  However, that function passes
      DIO_SKIP_HOLES to __blockdev_direct_IO, which is *really* confusing to
      the uninitiated reader.  The key, here, is that the get_block function
      passed in, ext4_get_block_write, completely ignores the create flag
      that is passed to it (the create flag is passed in from the direct I/O
      code, which uses the DIO_SKIP_HOLES flag to determine whether or not
      it should be cleared).
      
      This is a long-winded way of saying that the DIO_SKIP_HOLES flag is
      ultimately ignored.  So let's remove it.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      93ef8541
  9. 03 3月, 2012 1 次提交
  10. 21 2月, 2012 2 次提交
    • J
      ext4: fix race between unwritten extent conversion and truncate · 266991b1
      Jeff Moyer 提交于
      The following comment in ext4_end_io_dio caught my attention:
      
      	/* XXX: probably should move into the real I/O completion handler */
              inode_dio_done(inode);
      
      The truncate code takes i_mutex, then calls inode_dio_wait.  Because the
      ext4 code path above will end up dropping the mutex before it is
      reacquired by the worker thread that does the extent conversion, it
      seems to me that the truncate can happen out of order.  Jan Kara
      mentioned that this might result in error messages in the system logs,
      but that should be the extent of the "damage."
      
      The fix is pretty straight-forward: don't call inode_dio_done until the
      extent conversion is complete.
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      266991b1
    • L
      ext4: ignore EXT4_INODE_JOURNAL_DATA flag with delalloc · 3d2b1582
      Lukas Czerner 提交于
      Ext4 does not support data journalling with delayed allocation enabled.
      We even do not allow to mount the file system with delayed allocation
      and data journalling enabled, however it can be set via FS_IOC_SETFLAGS
      so we can hit the inode with EXT4_INODE_JOURNAL_DATA set even on file
      system mounted with delayed allocation (default) and that's where
      problem arises. The easies way to reproduce this problem is with the
      following set of commands:
      
       mkfs.ext4 /dev/sdd
       mount /dev/sdd /mnt/test1
       dd if=/dev/zero of=/mnt/test1/file bs=1M count=4
       chattr +j /mnt/test1/file
       dd if=/dev/zero of=/mnt/test1/file bs=1M count=4 conv=notrunc
       chattr -j /mnt/test1/file
      
      Additionally it can be reproduced quite reliably with xfstests 272 and
      269. In fact the above reproducer is a part of test 272.
      
      To fix this we should ignore the EXT4_INODE_JOURNAL_DATA inode flag if
      the file system is mounted with delayed allocation. This can be easily
      done by fixing ext4_should_*_data() functions do ignore data journal
      flag when delalloc is set (suggested by Ted). We also have to set the
      appropriate address space operations for the inode (again, ignoring data
      journal flag if delalloc enabled).
      
      Additionally this commit introduces ext4_inode_journal_mode() function
      because ext4_should_*_data() has already had a lot of common code and
      this change is putting it all into one function so it is easier to
      read.
      
      Successfully tested with xfstests in following configurations:
      
      delalloc + data=ordered
      delalloc + data=writeback
      data=journal
      nodelalloc + data=ordered
      nodelalloc + data=writeback
      nodelalloc + data=journal
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      3d2b1582
  11. 09 1月, 2012 1 次提交
  12. 05 1月, 2012 1 次提交
  13. 29 12月, 2011 4 次提交
  14. 14 12月, 2011 3 次提交
    • Y
      ext4: correctly handle pages w/o buffers in ext4_discard_partial_buffers() · 093e6e36
      Yongqiang Yang 提交于
      If a page has been read into memory and never been written, it has no
      buffers, but we should handle the page in truncate or punch hole.
      
      VFS code of writing operations has handled holes correctly, so this
      patch removes the code handling holes in writing operations.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      093e6e36
    • Y
      ext4: avoid potential hang in mpage_submit_io() when blocksize < pagesize · 13a79a47
      Yongqiang Yang 提交于
      If there is an unwritten but clean buffer in a page and there is a
      dirty buffer after the buffer, then mpage_submit_io does not write the
      dirty buffer out.  As a result, da_writepages loops forever.
      
      This patch fixes the problem by checking dirty flag.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      13a79a47
    • A
      ext4: avoid hangs in ext4_da_should_update_i_disksize() · ea51d132
      Andrea Arcangeli 提交于
      If the pte mapping in generic_perform_write() is unmapped between
      iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
      "copied" parameter to ->end_write can be zero. ext4 couldn't cope with
      it with delayed allocations enabled. This skips the i_disksize
      enlargement logic if copied is zero and no new data was appeneded to
      the inode.
      
       gdb> bt
       #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
       08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
       #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
       xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
       #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
       ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
       #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
       os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
       #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
       xffff88001e26be40) at mm/filemap.c:2600
       #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
       zed out>, pos=<value optimized out>) at mm/filemap.c:2632
       #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
       t fs/ext4/file.c:136
       #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
       ppos=0xffff88001e26bf48) at fs/read_write.c:406
       #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
       000, pos=0xffff88001e26bf48) at fs/read_write.c:435
       #9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
       4000) at fs/read_write.c:487
       #10 <signal handler called>
       #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
       #12 0x0000000000000000 in ?? ()
       gdb> print offset
       $22 = 0xffffffffffffffff
       gdb> print idx
       $23 = 0xffffffff
       gdb> print inode->i_blkbits
       $24 = 0xc
       gdb> up
       #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
       xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
       2512                    if (ext4_da_should_update_i_disksize(page, end)) {
       gdb> print start
       $25 = 0x0
       gdb> print end
       $26 = 0xffffffffffffffff
       gdb> print pos
       $27 = 0x108000
       gdb> print new_i_size
       $28 = 0x108000
       gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
       $29 = 0xd9000
       gdb> down
       2467            for (i = 0; i < idx; i++)
       gdb> print i
       $30 = 0xd44acbee
      
      This is 100% reproducible with some autonuma development code tuned in
      a very aggressive manner (not normal way even for knumad) which does
      "exotic" changes to the ptes. It wouldn't normally trigger but I don't
      see why it can't happen normally if the page is added to swap cache in
      between the two faults leading to "copied" being zero (which then
      hangs in ext4). So it should be fixed. Especially possible with lumpy
      reclaim (albeit disabled if compaction is enabled) as that would
      ignore the young bits in the ptes.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      ea51d132
  15. 12 12月, 2011 1 次提交
  16. 06 12月, 2011 1 次提交
  17. 02 12月, 2011 1 次提交
  18. 25 11月, 2011 1 次提交
    • T
      ext4: fix racy use-after-free in ext4_end_io_dio() · 4c81f045
      Tejun Heo 提交于
      ext4_end_io_dio() queues io_end->work and then clears iocb->private;
      however, io_end->work calls aio_complete() which frees the iocb
      object.  If that slab object gets reallocated, then ext4_end_io_dio()
      can end up clearing someone else's iocb->private, this use-after-free
      can cause a leak of a struct ext4_io_end_t structure.
      
      Detected and tested with slab poisoning.
      
      [ Note: Can also reproduce using 12 fio's against 12 file systems with the
        following configuration file:
      
        [global]
        direct=1
        ioengine=libaio
        iodepth=1
        bs=4k
        ba=4k
        size=128m
      
        [create]
        filename=${TESTDIR}
        rw=write
      
        -- tytso ]
      
      Google-Bug-Id: 5354697
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NKent Overstreet <koverstreet@google.com>
      Tested-by: NKent Overstreet <koverstreet@google.com>
      Cc: stable@kernel.org
      4c81f045
  19. 08 11月, 2011 1 次提交
  20. 02 11月, 2011 1 次提交
  21. 01 11月, 2011 5 次提交
  22. 31 10月, 2011 1 次提交
    • C
      writeback: Add a 'reason' to wb_writeback_work · 0e175a18
      Curt Wohlgemuth 提交于
      This creates a new 'reason' field in a wb_writeback_work
      structure, which unambiguously identifies who initiates
      writeback activity.  A 'wb_reason' enumeration has been
      added to writeback.h, to enumerate the possible reasons.
      
      The 'writeback_work_class' and tracepoint event class and
      'writeback_queue_io' tracepoints are updated to include the
      symbolic 'reason' in all trace events.
      
      And the 'writeback_inodes_sbXXX' family of routines has had
      a wb_stats parameter added to them, so callers can specify
      why writeback is being started.
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      0e175a18
  23. 26 10月, 2011 1 次提交