1. 21 2月, 2012 1 次提交
    • L
      ext4: ignore EXT4_INODE_JOURNAL_DATA flag with delalloc · 3d2b1582
      Lukas Czerner 提交于
      Ext4 does not support data journalling with delayed allocation enabled.
      We even do not allow to mount the file system with delayed allocation
      and data journalling enabled, however it can be set via FS_IOC_SETFLAGS
      so we can hit the inode with EXT4_INODE_JOURNAL_DATA set even on file
      system mounted with delayed allocation (default) and that's where
      problem arises. The easies way to reproduce this problem is with the
      following set of commands:
      
       mkfs.ext4 /dev/sdd
       mount /dev/sdd /mnt/test1
       dd if=/dev/zero of=/mnt/test1/file bs=1M count=4
       chattr +j /mnt/test1/file
       dd if=/dev/zero of=/mnt/test1/file bs=1M count=4 conv=notrunc
       chattr -j /mnt/test1/file
      
      Additionally it can be reproduced quite reliably with xfstests 272 and
      269. In fact the above reproducer is a part of test 272.
      
      To fix this we should ignore the EXT4_INODE_JOURNAL_DATA inode flag if
      the file system is mounted with delayed allocation. This can be easily
      done by fixing ext4_should_*_data() functions do ignore data journal
      flag when delalloc is set (suggested by Ted). We also have to set the
      appropriate address space operations for the inode (again, ignoring data
      journal flag if delalloc enabled).
      
      Additionally this commit introduces ext4_inode_journal_mode() function
      because ext4_should_*_data() has already had a lot of common code and
      this change is putting it all into one function so it is easier to
      read.
      
      Successfully tested with xfstests in following configurations:
      
      delalloc + data=ordered
      delalloc + data=writeback
      data=journal
      nodelalloc + data=ordered
      nodelalloc + data=writeback
      nodelalloc + data=journal
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      3d2b1582
  2. 09 1月, 2012 1 次提交
  3. 05 1月, 2012 1 次提交
  4. 29 12月, 2011 4 次提交
  5. 14 12月, 2011 3 次提交
    • Y
      ext4: correctly handle pages w/o buffers in ext4_discard_partial_buffers() · 093e6e36
      Yongqiang Yang 提交于
      If a page has been read into memory and never been written, it has no
      buffers, but we should handle the page in truncate or punch hole.
      
      VFS code of writing operations has handled holes correctly, so this
      patch removes the code handling holes in writing operations.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      093e6e36
    • Y
      ext4: avoid potential hang in mpage_submit_io() when blocksize < pagesize · 13a79a47
      Yongqiang Yang 提交于
      If there is an unwritten but clean buffer in a page and there is a
      dirty buffer after the buffer, then mpage_submit_io does not write the
      dirty buffer out.  As a result, da_writepages loops forever.
      
      This patch fixes the problem by checking dirty flag.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      13a79a47
    • A
      ext4: avoid hangs in ext4_da_should_update_i_disksize() · ea51d132
      Andrea Arcangeli 提交于
      If the pte mapping in generic_perform_write() is unmapped between
      iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
      "copied" parameter to ->end_write can be zero. ext4 couldn't cope with
      it with delayed allocations enabled. This skips the i_disksize
      enlargement logic if copied is zero and no new data was appeneded to
      the inode.
      
       gdb> bt
       #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
       08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
       #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
       xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
       #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
       ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
       #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
       os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
       #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
       xffff88001e26be40) at mm/filemap.c:2600
       #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
       zed out>, pos=<value optimized out>) at mm/filemap.c:2632
       #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
       t fs/ext4/file.c:136
       #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
       ppos=0xffff88001e26bf48) at fs/read_write.c:406
       #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
       000, pos=0xffff88001e26bf48) at fs/read_write.c:435
       #9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
       4000) at fs/read_write.c:487
       #10 <signal handler called>
       #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
       #12 0x0000000000000000 in ?? ()
       gdb> print offset
       $22 = 0xffffffffffffffff
       gdb> print idx
       $23 = 0xffffffff
       gdb> print inode->i_blkbits
       $24 = 0xc
       gdb> up
       #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
       xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
       2512                    if (ext4_da_should_update_i_disksize(page, end)) {
       gdb> print start
       $25 = 0x0
       gdb> print end
       $26 = 0xffffffffffffffff
       gdb> print pos
       $27 = 0x108000
       gdb> print new_i_size
       $28 = 0x108000
       gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
       $29 = 0xd9000
       gdb> down
       2467            for (i = 0; i < idx; i++)
       gdb> print i
       $30 = 0xd44acbee
      
      This is 100% reproducible with some autonuma development code tuned in
      a very aggressive manner (not normal way even for knumad) which does
      "exotic" changes to the ptes. It wouldn't normally trigger but I don't
      see why it can't happen normally if the page is added to swap cache in
      between the two faults leading to "copied" being zero (which then
      hangs in ext4). So it should be fixed. Especially possible with lumpy
      reclaim (albeit disabled if compaction is enabled) as that would
      ignore the young bits in the ptes.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      ea51d132
  6. 12 12月, 2011 1 次提交
  7. 06 12月, 2011 1 次提交
  8. 02 12月, 2011 1 次提交
  9. 25 11月, 2011 1 次提交
    • T
      ext4: fix racy use-after-free in ext4_end_io_dio() · 4c81f045
      Tejun Heo 提交于
      ext4_end_io_dio() queues io_end->work and then clears iocb->private;
      however, io_end->work calls aio_complete() which frees the iocb
      object.  If that slab object gets reallocated, then ext4_end_io_dio()
      can end up clearing someone else's iocb->private, this use-after-free
      can cause a leak of a struct ext4_io_end_t structure.
      
      Detected and tested with slab poisoning.
      
      [ Note: Can also reproduce using 12 fio's against 12 file systems with the
        following configuration file:
      
        [global]
        direct=1
        ioengine=libaio
        iodepth=1
        bs=4k
        ba=4k
        size=128m
      
        [create]
        filename=${TESTDIR}
        rw=write
      
        -- tytso ]
      
      Google-Bug-Id: 5354697
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NKent Overstreet <koverstreet@google.com>
      Tested-by: NKent Overstreet <koverstreet@google.com>
      Cc: stable@kernel.org
      4c81f045
  10. 08 11月, 2011 1 次提交
  11. 02 11月, 2011 1 次提交
  12. 01 11月, 2011 5 次提交
  13. 31 10月, 2011 1 次提交
    • C
      writeback: Add a 'reason' to wb_writeback_work · 0e175a18
      Curt Wohlgemuth 提交于
      This creates a new 'reason' field in a wb_writeback_work
      structure, which unambiguously identifies who initiates
      writeback activity.  A 'wb_reason' enumeration has been
      added to writeback.h, to enumerate the possible reasons.
      
      The 'writeback_work_class' and tracepoint event class and
      'writeback_queue_io' tracepoints are updated to include the
      symbolic 'reason' in all trace events.
      
      And the 'writeback_inodes_sbXXX' family of routines has had
      a wb_stats parameter added to them, so callers can specify
      why writeback is being started.
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      0e175a18
  14. 26 10月, 2011 1 次提交
  15. 25 10月, 2011 1 次提交
    • D
      ext4: update EOFBLOCKS flag on fallocate properly · a4e5d88b
      Dmitry Monakhov 提交于
      EOFBLOCK_FL should be updated if called w/o FALLOCATE_FL_KEEP_SIZE
      Currently it happens only if new extent was allocated.
      
      TESTCASE:
      fallocate test_file -n -l4096
      fallocate test_file -l4096
      Last fallocate cmd has updated size, but keept EOFBLOCK_FL set. And
      fsck will complain about that.
      
      Also remove ping pong in ext4_fallocate() in case of new extents,
      where ext4_ext_map_blocks() clear EOFBLOCKS bit, and later
      ext4_falloc_update_inode() restore it again.
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a4e5d88b
  16. 21 10月, 2011 2 次提交
  17. 18 10月, 2011 1 次提交
  18. 09 10月, 2011 1 次提交
  19. 10 9月, 2011 8 次提交
    • A
      ext4: attempt to fix race in bigalloc code path · 5356f261
      Aditya Kali 提交于
      Currently, there exists a race between delayed allocated writes and
      the writeback when bigalloc feature is in use. The race was because we
      wanted to determine what blocks in a cluster are under delayed
      allocation and we were using buffer_delayed(bh) check for it. But, the
      writeback codepath clears this bit without any synchronization which
      resulted in a race and an ext4 warning similar to:
      
      EXT4-fs (ram1): ext4_da_update_reserve_space: ino 13, used 1 with only 0
      		reserved data blocks
      
      The race existed in two places.
      (1) between ext4_find_delalloc_range() and ext4_map_blocks() when called from
          writeback code path.
      (2) between ext4_find_delalloc_range() and ext4_da_get_block_prep() (where
          buffer_delayed(bh) is set.
      
      To fix (1), this patch introduces a new buffer_head state bit -
      BH_Da_Mapped.  This bit is set under the protection of
      EXT4_I(inode)->i_data_sem when we have actually mapped the delayed
      allocated blocks during the writeout time. We can now reliably check
      for this bit inside ext4_find_delalloc_range() to determine whether
      the reservation for the blocks have already been claimed or not.
      
      To fix (2), it was necessary to set buffer_delay(bh) under the
      protection of i_data_sem.  So, I extracted the very beginning of
      ext4_map_blocks into a new function - ext4_da_map_blocks() - and
      performed the required setting of bh_delay bit and the quota
      reservation under the protection of i_data_sem.  These two fixes makes
      the checking of buffer_delay(bh) and buffer_da_mapped(bh) consistent,
      thus removing the race.
      
      Tested: I was able to reproduce the problem by running 'dd' and
      'fsync' in parallel. Also, xfstests sometimes used to reproduce this
      race. After the fix both my test and xfstests were successful and no
      race (warning message) was observed.
      
      Google-Bug-Id: 4997027
      Signed-off-by: NAditya Kali <adityakali@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5356f261
    • A
      ext4: add some tracepoints in ext4/extents.c · d8990240
      Aditya Kali 提交于
      This patch adds some tracepoints in ext4/extents.c and updates a tracepoint in
      ext4/inode.c.
      
      Tested: Built and ran the kernel and verified that these tracepoints work.
      Also ran xfstests.
      Signed-off-by: NAditya Kali <adityakali@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
          
      d8990240
    • T
      ext4: rename ext4_has_free_blocks() to ext4_has_free_clusters() · df55c99d
      Theodore Ts'o 提交于
      Rename the function so it is more clear what is going on.  Also rename
      the various variables so it's clearer what's happening.
      
      Also fix a missing blocks to cluster conversion when reading the
      number of reserved blocks for root.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      df55c99d
    • T
      ext4: rename ext4_claim_free_blocks() to ext4_claim_free_clusters() · e7d5f315
      Theodore Ts'o 提交于
      This function really claims a number of free clusters, not blocks, so
      rename it so it's clearer what's going on.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e7d5f315
    • T
      ext4: rename ext4_count_free_blocks() to ext4_count_free_clusters() · 5dee5437
      Theodore Ts'o 提交于
      This function really counts the free clusters reported in the block
      group descriptors, so rename it to reduce confusion.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5dee5437
    • A
      ext4: Fix bigalloc quota accounting and i_blocks value · 7b415bf6
      Aditya Kali 提交于
      With bigalloc changes, the i_blocks value was not correctly set (it was still
      set to number of blocks being used, but in case of bigalloc, we want i_blocks
      to represent the number of clusters being used). Since the quota subsystem sets
      the i_blocks value, this patch fixes the quota accounting and makes sure that
      the i_blocks value is set correctly.
      Signed-off-by: NAditya Kali <adityakali@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7b415bf6
    • T
      ext4: convert s_{dirty,free}blocks_counter to s_{dirty,free}clusters_counter · 57042651
      Theodore Ts'o 提交于
      Convert the percpu counters s_dirtyblocks_counter and
      s_freeblocks_counter in struct ext4_super_info to be
      s_dirtyclusters_counter and s_freeclusters_counter.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      57042651
    • T
      ext4: enforce bigalloc restrictions (e.g., no online resizing, etc.) · bab08ab9
      Theodore Ts'o 提交于
      At least initially if the bigalloc feature is enabled, we will not
      support non-extent mapped inodes, online resizing, online defrag, or
      the FITRIM ioctl.  This simplifies the initial implementation.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      bab08ab9
  20. 07 9月, 2011 1 次提交
    • A
      ext4: fix partial page writes · 02fac129
      Allison Henderson 提交于
      While running extended fsx tests to verify the preceeding patches,
      a similar bug was also found in the write operation
      
      When ever a write operation begins or ends in a hole,
      or extends EOF, the partial page contained in the hole
      or beyond EOF needs to be zeroed out.
      
      To correct this the new ext4_discard_partial_page_buffers_no_lock
      routine is used to zero out the partial page, but only for buffer
      heads that are already unmapped.
      Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      02fac129
  21. 06 9月, 2011 1 次提交
    • T
      ext4: only call ext4_jbd2_file_inode when an inode has been extended · decbd919
      Theodore Ts'o 提交于
      In delayed allocation mode, it's important to only call
      ext4_jbd2_file_inode when the file has been extended.  This is
      necessary to avoid a race which first got introduced in commit
      678aaf48, but which was made much more common with the introduction
      of the "punch hole" functionality.  (Especially when dioread_nolock
      was enabled; when I could reliably reproduce this problem with
      xfstests #74.)
      
      The race is this: If while trying to writeback a delayed allocation
      inode, there is a need to map delalloc blocks, and we run out of space
      in the journal, *and* at the same time the inode is already on the
      committing transaction's t_inode_list (because for example while doing
      the punch hole operation, ext4_jbd2_file_inode() is called), then the
      commit operation will wait for the inode to finish all of its pending
      writebacks by calling filemap_fdatawait(), but since that inode has
      one or more pages with the PageWriteback flag set, the commit
      operation will wait forever, and the so the writeback of the inode can
      never take place, and the kjournald thread and the writeback thread
      end up waiting for each other --- forever.
      
      It's important at this point to recall why an inode is placed on the
      t_inode_list; it is to provide the data=ordered guarantees that we
      don't end up exposing stale data.  In the case where we are truncating
      or punching a hole in the inode, there is no possibility that stale
      data could be exposed in the first place, so we don't need to put the
      inode on the t_inode_list!
      
      The right long-term fix is to get rid of data=ordered mode altogether,
      and only update the extent tree or indirect blocks after the data has
      been written.  Until then, this change will also avoid some
      unnecessary waiting in the commit operation.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Allison Henderson <achender@linux.vnet.ibm.com>
      Cc: Jan Kara <jack@suse.cz>
      decbd919
  22. 03 9月, 2011 1 次提交
    • A
      ext4: Add new ext4_discard_partial_page_buffers routines · 4e96b2db
      Allison Henderson 提交于
      This patch adds two new routines: ext4_discard_partial_page_buffers
      and ext4_discard_partial_page_buffers_no_lock.
      
      The ext4_discard_partial_page_buffers routine is a wrapper
      function to ext4_discard_partial_page_buffers_no_lock.
      The wrapper function locks the page and passes it to
      ext4_discard_partial_page_buffers_no_lock.
      Calling functions that already have the page locked can call
      ext4_discard_partial_page_buffers_no_lock directly.
      
      The ext4_discard_partial_page_buffers_no_lock function
      zeros a specified range in a page, and unmaps the
      corresponding buffer heads.  Only block aligned regions of the
      page will have their buffer heads unmapped.  Unblock aligned regions
      will be mapped if needed so that they can be updated with the
      partial zero out.  This function is meant to
      be used to update a page and its buffer heads to be zeroed
      and unmapped when the corresponding blocks have been released
      or will be released.
      
      This routine is used in the following scenarios:
      * A hole is punched and the non page aligned regions
        of the head and tail of the hole need to be discarded
      
      * The file is truncated and the partial page beyond EOF needs
        to be discarded
      
      * The end of a hole is in the same page as EOF.  After the
        page is flushed, the partial page beyond EOF needs to be
        discarded.
      
      * A write operation begins or ends inside a hole and the partial
        page appearing before or after the write needs to be discarded
      
      * A write operation extends EOF and the partial page beyond EOF
        needs to be discarded
      
      This function takes a flag EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED
      which is used when a write operation begins or ends in a hole.
      When the EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED flag is used, only
      buffer heads that are already unmapped will have the corresponding
      regions of the page zeroed.
      Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4e96b2db
  23. 31 8月, 2011 1 次提交