1. 13 4月, 2014 1 次提交
    • Z
      ext4: fix byte order problems introduced by the COLLAPSE_RANGE patches · 847c6c42
      Zheng Liu 提交于
      This commit tries to fix some byte order issues that is found by sparse
      check.
      
      $ make M=fs/ext4 C=2 CF=-D__CHECK_ENDIAN__
      ...
        CHECK   fs/ext4/extents.c
      fs/ext4/extents.c:5232:41: warning: restricted __le32 degrades to integer
      fs/ext4/extents.c:5236:52: warning: bad assignment (-=) to restricted __le32
      fs/ext4/extents.c:5258:45: warning: bad assignment (-=) to restricted __le32
      fs/ext4/extents.c:5303:28: warning: restricted __le32 degrades to integer
      fs/ext4/extents.c:5318:18: warning: incorrect type in assignment (different base types)
      fs/ext4/extents.c:5318:18:    expected unsigned int [unsigned] [usertype] ex_start
      fs/ext4/extents.c:5318:18:    got restricted __le32 [usertype] ee_block
      fs/ext4/extents.c:5319:24: warning: restricted __le32 degrades to integer
      fs/ext4/extents.c:5334:31: warning: incorrect type in assignment (different base types)
      ...
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      847c6c42
  2. 12 4月, 2014 3 次提交
  3. 11 4月, 2014 1 次提交
    • N
      ext4: fix COLLAPSE_RANGE test failure in data journalling mode · 1ce01c4a
      Namjae Jeon 提交于
      When mounting ext4 with data=journal option, xfstest shared/002 and
      shared/004 are currently failing as checksum computed for testfile
      does not match with the checksum computed in other journal modes.
      In case of data=journal mode, a call to filemap_write_and_wait_range
      will not flush anything to disk as buffers are not marked dirty in
      write_end. In collapse range this call is followed by a call to
      truncate_pagecache_range. Due to this, when checksum is computed,
      a portion of file is re-read from disk which replace valid data with
      NULL bytes and hence the reason for the difference in checksum.
      
      Calling ext4_force_commit before filemap_write_and_wait_range solves
      the issue as it will mark the buffers dirty during commit transaction
      which can be later synced by a call to filemap_write_and_wait_range.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      1ce01c4a
  4. 02 4月, 2014 1 次提交
    • E
      ext4: fix premature freeing of partial clusters split across leaf blocks · ad6599ab
      Eric Whitney 提交于
      Xfstests generic/311 and shared/298 fail when run on a bigalloc file
      system.  Kernel error messages produced during the tests report that
      blocks to be freed are already on the to-be-freed list.  When e2fsck
      is run at the end of the tests, it typically reports bad i_blocks and
      bad free blocks counts.
      
      The bug that causes these failures is located in ext4_ext_rm_leaf().
      Code at the end of the function frees a partial cluster if it's not
      shared with an extent remaining in the leaf.  However, if all the
      extents in the leaf have been removed, the code dereferences an
      invalid extent pointer (off the front of the leaf) when the check for
      sharing is made.  This generally has the effect of unconditionally
      freeing the partial cluster, which leads to the observed failures
      when the partial cluster is shared with the last extent in the next
      leaf.
      
      Fix this by attempting to free the cluster only if extents remain in
      the leaf.  Any remaining partial cluster will be freed if possible
      when the next leaf is processed or when leaf removal is complete.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      ad6599ab
  5. 01 4月, 2014 1 次提交
  6. 19 3月, 2014 3 次提交
    • L
      ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate · b8a86845
      Lukas Czerner 提交于
      Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same
      functionality as xfs ioctl XFS_IOC_ZERO_RANGE.
      
      It can be used to convert a range of file to zeros preferably without
      issuing data IO. Blocks should be preallocated for the regions that span
      holes in the file, and the entire range is preferable converted to
      unwritten extents
      
      This can be also used to preallocate blocks past EOF in the same way as
      with fallocate. Flag FALLOC_FL_KEEP_SIZE which should cause the inode
      size to remain the same.
      
      Also add appropriate tracepoints.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b8a86845
    • L
      ext4: refactor ext4_fallocate code · 0e8b6879
      Lukas Czerner 提交于
      Move block allocation out of the ext4_fallocate into separate function
      called ext4_alloc_file_blocks(). This will allow us to use the same
      allocation code for other allocation operations such as zero range which
      is commit in the next patch.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0e8b6879
    • L
      ext4: Update inode i_size after the preallocation · f282ac19
      Lukas Czerner 提交于
      Currently in ext4_fallocate we would update inode size, c_time and sync
      the file with every partial allocation which is entirely unnecessary. It
      is true that if the crash happens in the middle of truncate we might end
      up with unchanged i size, or c_time which I do not think is really a
      problem - it does not mean file system corruption in any way. Note that
      xfs is doing things the same way e.g. update all of the mentioned after
      the allocation is done.
      
      This commit moves all the updates after the allocation is done. In
      addition we also need to change m_time as not only inode has been change
      bot also data regions might have changed (unwritten extents). However
      m_time will be only updated when i_size changed.
      
      Also we do not need to be paranoid about changing the c_time only if the
      actual allocation have happened, we can change it even if we try to
      allocate only to find out that there are already block allocated. It's
      not really a big deal and it will save us some additional complexity.
      
      Also use ext4_debug, instead of ext4_warning in #ifdef EXT4FS_DEBUG
      section.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>-
      --
      v3: Do not remove the code to set EXT4_INODE_EOFBLOCKS flag
      
       fs/ext4/extents.c | 96 ++++++++++++++++++++++++-------------------------------
       1 file changed, 42 insertions(+), 54 deletions(-)
      f282ac19
  7. 14 3月, 2014 2 次提交
    • E
      ext4: fix partial cluster handling for bigalloc file systems · c0634493
      Eric Whitney 提交于
      Commit 9cb00419, which enables hole punching for bigalloc file
      systems, exposed a bug introduced by commit 6ae06ff5 in an earlier
      release.  When run on a bigalloc file system, xfstests generic/013, 068,
      075, 083, 091, 100, 112, 127, 263, 269, and 270 fail with e2fsck errors
      or cause kernel error messages indicating that previously freed blocks
      are being freed again.
      
      The latter commit optimizes the selection of the starting extent in
      ext4_ext_rm_leaf() when hole punching by beginning with the extent
      supplied in the path argument rather than with the last extent in the
      leaf node (as is still done when truncating).  However, the code in
      rm_leaf that initially sets partial_cluster to track cluster sharing on
      extent boundaries is only guaranteed to run if rm_leaf starts with the
      last node in the leaf.  Consequently, partial_cluster is not correctly
      initialized when hole punching, and a cluster on the boundary of a
      punched region that should be retained may instead be deallocated.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      c0634493
    • E
      ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents · 31cf0f2c
      Eric Whitney 提交于
      Code deallocating the extent path referenced by an argument to
      ext4_ext_handle_uninitialized_extents was made redundant with identical
      code in its one caller, ext4_ext_map_blocks, by commit 37794732.
      Allocating and deallocating the path in the same function also makes
      the code clearer.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      31cf0f2c
  8. 24 2月, 2014 1 次提交
  9. 22 2月, 2014 1 次提交
  10. 21 2月, 2014 1 次提交
  11. 20 2月, 2014 1 次提交
  12. 12 2月, 2014 1 次提交
    • E
      ext4: fix xfstest generic/299 block validity failures · 15cc1767
      Eric Whitney 提交于
      Commit a115f749 (ext4: remove wait for unwritten extent conversion from
      ext4_truncate) exposed a bug in ext4_ext_handle_uninitialized_extents().
      It can be triggered by xfstest generic/299 when run on a test file
      system created without a journal.  This test continuously fallocates and
      truncates files to which random dio/aio writes are simultaneously
      performed by a separate process.  The test completes successfully, but
      if the test filesystem is mounted with the block_validity option, a
      warning message stating that a logical block has been mapped to an
      illegal physical block is posted in the kernel log.
      
      The bug occurs when an extent is being converted to the written state
      by ext4_end_io_dio() and ext4_ext_handle_uninitialized_extents()
      discovers a mapping for an existing uninitialized extent. Although it
      sets EXT4_MAP_MAPPED in map->m_flags, it fails to set map->m_pblk to
      the discovered physical block number.  Because map->m_pblk is not
      otherwise initialized or set by this function or its callers, its
      uninitialized value is returned to ext4_map_blocks(), where it is
      stored as a bogus mapping in the extent status tree.
      
      Since map->m_pblk can accidentally contain illegal values that are
      larger than the physical size of the file system,  calls to
      check_block_validity() in ext4_map_blocks() that are enabled if the
      block_validity mount option is used can fail, resulting in the logged
      warning message.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org  # 3.11+
      15cc1767
  13. 07 1月, 2014 2 次提交
  14. 20 12月, 2013 1 次提交
  15. 04 12月, 2013 1 次提交
    • E
      ext4: check for overlapping extents in ext4_valid_extent_entries() · 5946d089
      Eryu Guan 提交于
      A corrupted ext4 may have out of order leaf extents, i.e.
      
      extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT
      extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT
                   ^^^^ overlap with previous extent
      
      Reading such extent could hit BUG_ON() in ext4_es_cache_extent().
      
      	BUG_ON(end < lblk);
      
      The problem is that __read_extent_tree_block() tries to cache holes as
      well but assumes 'lblk' is greater than 'prev' and passes underflowed
      length to ext4_es_cache_extent(). Fix it by checking for overlapping
      extents in ext4_valid_extent_entries().
      
      I hit this when fuzz testing ext4, and am able to reproduce it by
      modifying the on-disk extent by hand.
      
      Also add the check for (ee_block + len - 1) in ext4_valid_extent() to
      make sure the value is not overflow.
      
      Ran xfstests on patched ext4 and no regression.
      
      Cc: Lukáš Czerner <lczerner@redhat.com>
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      5946d089
  16. 08 11月, 2013 1 次提交
  17. 04 11月, 2013 1 次提交
  18. 29 8月, 2013 3 次提交
  19. 17 8月, 2013 5 次提交
    • J
      ext4: fix warning in ext4_da_update_reserve_space() · 7d734532
      Jan Kara 提交于
      reaim workfile.dbase test easily triggers warning in
      ext4_da_update_reserve_space():
      
      EXT4-fs warning (device ram0): ext4_da_update_reserve_space:365:
      ino 12, allocated 1 with only 0 reserved metadata blocks (releasing 1
      blocks with reserved 9 data blocks)
      
      The problem is that (one of) tests creates file and then randomly writes
      to it with O_SYNC. That results in writing back pages of the file in
      random order so we create extents for written blocks say 0, 2, 4, 6, 8
      - this last allocation also allocates new block for extents. Then we
      writeout block 1 so we have extents 0-2, 4, 6, 8 and we release
      indirect extent block because extents fit in the inode again. Then we
      writeout block 10 and we need to allocate indirect extent block again
      which triggers the warning because we don't have the reservation
      anymore.
      
      Fix the problem by giving back freed metadata blocks resulting from
      extent merging into inode's reservation pool.
      Signed-off-by: NJan Kara <jack@suse.cz>
      7d734532
    • T
      ext4: add support for extent pre-caching · 7869a4a6
      Theodore Ts'o 提交于
      Add a new fiemap flag which forces the all of the extents in an inode
      to be cached in the extent_status tree.  This is critically important
      when using AIO to a preallocated file, since if we need to read in
      blocks from the extent tree, the io_submit(2) system call becomes
      synchronous, and the AIO is no longer "A", which is bad.
      
      In addition, for most files which have an external leaf tree block,
      the cost of caching the information in the extent status tree will be
      less than caching the entire 4k block in the buffer cache.  So it is
      generally a win to keep the extent information cached.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7869a4a6
    • T
      ext4: cache all of an extent tree's leaf block upon reading · 107a7bd3
      Theodore Ts'o 提交于
      When we read in an extent tree leaf block from disk, arrange to have
      all of its entries cached.  In nearly all cases the in-memory
      representation will be more compact than the on-disk representation in
      the buffer cache, and it allows us to get the information without
      having to traverse the extent tree for successive extents.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      107a7bd3
    • T
      ext4: print the block number of invalid extent tree blocks · c349179b
      Theodore Ts'o 提交于
      When we find an invalid extent tree block, report the block number of
      the bad block for debugging purposes.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      c349179b
    • T
      ext4: refactor code to read the extent tree block · 7d7ea89e
      Theodore Ts'o 提交于
      Refactor out the code needed to read the extent tree block into a
      single read_extent_tree_block() function.  In addition to simplifying
      the code, it also makes sure that we call the ext4_ext_load_extent
      tracepoint whenever we need to read an extent tree block from disk.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      7d7ea89e
  20. 30 7月, 2013 1 次提交
  21. 16 7月, 2013 2 次提交
    • T
      ext4: call ext4_es_lru_add() after handling cache miss · 63b99968
      Theodore Ts'o 提交于
      If there are no items in the extent status tree, ext4_es_lru_add() is
      a no-op.  So it is not sufficient to call ext4_es_lru_add() before we
      try to lookup an entry in the extent status tree.  We also need to
      call it at the end of ext4_ext_map_blocks(), after items have been
      added to the extent status tree.
      
      This could lead to inodes with that have extent status trees but which
      are not in the LRU list, which means they won't get considered for
      eviction by the es_shrinker.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Zheng Liu <wenqing.lz@taobao.com>
      Cc: stable@vger.kernel.org
      63b99968
    • T
      ext4: yield during large unlinks · 76828c88
      Theodore Ts'o 提交于
      During large unlink operations on files with extents, we can use a lot
      of CPU time.  This adds a cond_resched() call when starting to examine
      the next level of a multi-level extent tree.  Multi-level extent trees
      are rare in the first place, and this should rarely be executed.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      76828c88
  22. 15 7月, 2013 2 次提交
    • T
      ext4: simplify calculation of blocks to free on error · c8e15130
      Theodore Ts'o 提交于
      In ext4_ext_map_blocks(), if we have successfully allocated the data
      blocks, but then run into trouble inserting the extent into the extent
      tree, most likely due to an ENOSPC condition, determine the arguments
      to ext4_free_blocks() in a simpler way which is easier to prove to be
      correct.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c8e15130
    • T
      ext4: fix error handling in ext4_ext_truncate() · 8acd5e9b
      Theodore Ts'o 提交于
      Previously ext4_ext_truncate() was ignoring potential error returns
      from ext4_es_remove_extent() and ext4_ext_remove_space().  This can
      lead to the on-diks extent tree and the extent status tree cache
      getting out of sync, which is particuarlly bad, and can lead to file
      system corruption and potential data loss.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      8acd5e9b
  23. 01 7月, 2013 3 次提交
  24. 13 6月, 2013 1 次提交