1. 13 5月, 2016 1 次提交
    • J
      ext4: refactor direct IO code · 914f82a3
      Jan Kara 提交于
      Currently ext4 direct IO handling is split between ext4_ext_direct_IO()
      and ext4_ind_direct_IO(). However the extent based function calls into
      the indirect based one for some cases and for example it is not able to
      handle file extending. Previously it was not also properly handling
      retries in case of ENOSPC errors. With DAX things would get even more
      contrieved so just refactor the direct IO code and instead of indirect /
      extent split do the split to read vs writes.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      914f82a3
  2. 10 3月, 2016 1 次提交
    • J
      ext4: return hole from ext4_map_blocks() · facab4d9
      Jan Kara 提交于
      Currently, ext4_map_blocks() just returns 0 when it finds a hole and
      allocation is not requested. However we have all the information
      available to tell how large the hole actually is and there are callers
      of ext4_map_blocks() which would save some block-by-block hole iteration
      if they knew this information. So fill in struct ext4_map_blocks even
      for holes with the information we have. We keep returning 0 for holes to
      maintain backward compatibility of the function.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      facab4d9
  3. 09 3月, 2016 1 次提交
  4. 18 10月, 2015 2 次提交
  5. 09 9月, 2015 1 次提交
    • M
      dax: move DAX-related functions to a new header · c94c2acf
      Matthew Wilcox 提交于
      In order to handle the !CONFIG_TRANSPARENT_HUGEPAGES case, we need to
      return VM_FAULT_FALLBACK from the inlined dax_pmd_fault(), which is
      defined in linux/mm.h.  Given that we don't want to include <linux/mm.h>
      in <linux/fs.h>, the easiest solution is to move the DAX-related
      functions to a new header, <linux/dax.h>.  We could also have moved
      VM_FAULT_* definitions to a new header, or a different header that isn't
      quite such a boil-the-ocean header as <linux/mm.h>, but this felt like
      the best option.
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c94c2acf
  6. 22 6月, 2015 1 次提交
  7. 21 6月, 2015 1 次提交
    • T
      ext4: prevent ext4_quota_write() from failing due to ENOSPC · c5e298ae
      Theodore Ts'o 提交于
      In order to prevent quota block tracking to be inaccurate when
      ext4_quota_write() fails with ENOSPC, we make two changes.  The quota
      file can now use the reserved block (since the quota file is arguably
      file system metadata), and ext4_quota_write() now uses
      ext4_should_retry_alloc() to retry the block allocation after a commit
      has completed and released some blocks for allocation.
      
      This fixes failures of xfstests generic/270:
      
      Quota error (device vdc): write_blk: dquota write failed
      Quota error (device vdc): qtree_write_dquot: Error -28 occurred while creating quota
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c5e298ae
  8. 25 4月, 2015 1 次提交
    • J
      direct-io: only inc/dec inode->i_dio_count for file systems · fe0f07d0
      Jens Axboe 提交于
      do_blockdev_direct_IO() increments and decrements the inode
      ->i_dio_count for each IO operation. It does this to protect against
      truncate of a file. Block devices don't need this sort of protection.
      
      For a capable multiqueue setup, this atomic int is the only shared
      state between applications accessing the device for O_DIRECT, and it
      presents a scaling wall for that. In my testing, as much as 30% of
      system time is spent incrementing and decrementing this value. A mixed
      read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
      better latencies too. Before:
      
      clat percentiles (usec):
       |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
       | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
       | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
       | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
       | 99.99th=[  165]
      
      After:
      
      clat percentiles (usec):
       |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
       | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
       | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
       | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
       | 99.99th=[  438]
      
      In other setups, Robert Elliott reported seeing good performance
      improvements:
      
      https://lkml.org/lkml/2015/4/3/557
      
      The more applications accessing the device, the worse it gets.
      
      Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
      do_blockdev_direct_IO() that it need not worry about incrementing
      or decrementing the inode i_dio_count for this caller.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fe0f07d0
  9. 12 4月, 2015 3 次提交
  10. 26 3月, 2015 1 次提交
  11. 17 2月, 2015 1 次提交
  12. 15 2月, 2015 1 次提交
    • O
      ext4: fix indirect punch hole corruption · 6f30b7e3
      Omar Sandoval 提交于
      Commit 4f579ae7 (ext4: fix punch hole on files with indirect
      mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
      mapping. However, there are bugs in several corner cases. This fixes 5
      distinct bugs:
      
      1. When there is at least one entire level of indirection between the
      start and end of the punch range and the end of the punch range is the
      first block of its level, we can't return early; we have to free the
      intervening levels.
      
      2. When the end is at a higher level of indirection than the start and
      ext4_find_shared returns a top branch for the end, we still need to free
      the rest of the shared branch it returns; we can't decrement partial2.
      
      3. When a punch happens within one level of indirection, we need to
      converge on an indirect block that contains the start and end. However,
      because the branches returned from ext4_find_shared do not necessarily
      start at the same level (e.g., the partial2 chain will be shallower if
      the last block occurs at the beginning of an indirect group), the walk
      of the two chains can end up "missing" each other and freeing a bunch of
      extra blocks in the process. This mismatch can be handled by first
      making sure that the chains are at the same level, then walking them
      together until they converge.
      
      4. When the punch happens within one level of indirection and
      ext4_find_shared returns a top branch for the start, we must free it,
      but only if the end does not occur within that branch.
      
      5. When the punch happens within one level of indirection and
      ext4_find_shared returns a top branch for the end, then we shouldn't
      free the block referenced by the end of the returned chain (this mirrors
      the different levels case).
      Signed-off-by: NOmar Sandoval <osandov@osandov.com>
      6f30b7e3
  13. 05 9月, 2014 2 次提交
    • T
      ext4: prepare to drop EXT4_STATE_DELALLOC_RESERVED · e3cf5d5d
      Theodore Ts'o 提交于
      The EXT4_STATE_DELALLOC_RESERVED flag was originally implemented
      because it was too hard to make sure the mballoc and get_block flags
      could be reliably passed down through all of the codepaths that end up
      calling ext4_mb_new_blocks().
      
      Since then, we have mb_flags passed down through most of the code
      paths, so getting rid of EXT4_STATE_DELALLOC_RESERVED isn't as tricky
      as it used to.
      
      This commit plumbs in the last of what is required, and then adds a
      WARN_ON check to make sure we haven't missed anything.  If this passes
      a full regression test run, we can then drop
      EXT4_STATE_DELALLOC_RESERVED.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      e3cf5d5d
    • T
      ext4: pass allocation_request struct to ext4_(alloc,splice)_branch · a5211002
      Theodore Ts'o 提交于
      Instead of initializing the allocation_request structure in
      ext4_alloc_branch(), set it up in ext4_ind_map_blocks(), and then pass
      it to ext4_alloc_branch() and ext4_splice_branch().
      
      This allows ext4_ind_map_blocks to pass flags in the allocation
      request structure without having to add Yet Another argument to
      ext4_alloc_branch().
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      a5211002
  14. 15 7月, 2014 1 次提交
    • L
      ext4: fix punch hole on files with indirect mapping · 4f579ae7
      Lukas Czerner 提交于
      Currently punch hole code on files with direct/indirect mapping has some
      problems which may lead to a data loss. For example (from Jan Kara):
      
      fallocate -n -p 10240000 4096
      
      will punch the range 10240000 - 12632064 instead of the range 1024000 -
      10244096.
      
      Also the code is a bit weird and it's not using infrastructure provided
      by indirect.c, but rather creating it's own way.
      
      This patch fixes the issues as well as making the operation to run 4
      times faster from my testing (punching out 60GB file). It uses similar
      approach used in ext4_ind_truncate() which takes advantage of
      ext4_free_branches() function.
      
      Also rename the ext4_free_hole_blocks() to something more sensible, like
      the equivalent we have for extent mapped files. Call it
      ext4_ind_remove_space().
      
      This has been tested mostly with fsx and some xfstests which are testing
      punch hole but does not require unwritten extents which are not
      supported with direct/indirect mapping. Not problems showed up even with
      1024k block size.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      4f579ae7
  15. 27 6月, 2014 2 次提交
    • J
      ext4: Fix hole punching for files with indirect blocks · a93cd4cf
      Jan Kara 提交于
      Hole punching code for files with indirect blocks wrongly computed
      number of blocks which need to be cleared when traversing the indirect
      block tree. That could result in punching more blocks than actually
      requested and thus effectively cause a data loss. For example:
      
      fallocate -n -p 10240000 4096
      
      will punch the range 10240000 - 12632064 instead of the range 1024000 -
      10244096. Fix the calculation.
      
      CC: stable@vger.kernel.org
      Fixes: 8bad6fc8Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      a93cd4cf
    • J
      ext4: Fix block zeroing when punching holes in indirect block files · 77ea2a4b
      Jan Kara 提交于
      free_holes_block() passed local variable as a block pointer
      to ext4_clear_blocks(). Thus ext4_clear_blocks() zeroed out this local
      variable instead of proper place in inode / indirect block. We later
      zero out proper place in inode / indirect block but don't dirty the
      inode / buffer again which can lead to subtle issues (some changes e.g.
      to inode can be lost).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      77ea2a4b
  16. 16 6月, 2014 1 次提交
    • J
      ext4: Fix buffer double free in ext4_alloc_branch() · c5c7b8dd
      Jan Kara 提交于
      Error recovery in ext4_alloc_branch() calls ext4_forget() even for
      buffer corresponding to indirect block it did not allocate. This leads
      to brelse() being called twice for that buffer (once from ext4_forget()
      and once from cleanup in ext4_ind_map_blocks()) leading to buffer use
      count misaccounting. Eventually (but often much later because there
      are other users of the buffer) we will see messages like:
      VFS: brelse: Trying to free free buffer
      
      Another manifestation of this problem is an error:
      JBD2 unexpected failure: jbd2_journal_revoke: !buffer_revoked(bh);
      inconsistent data on disk
      
      The fix is easy - don't forget buffer we did not allocate. Also add an
      explanatory comment because the indexing at ext4_alloc_branch() is
      somewhat subtle.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      c5c7b8dd
  17. 07 5月, 2014 3 次提交
  18. 29 8月, 2013 1 次提交
    • Z
      ext4: isolate ext4_extents.h file · d7b2a00c
      Zheng Liu 提交于
      After applied the commit (4a092d73), we have reduced the number of
      source files that need to #include ext4_extents.h.  But we can do
      better.
      
      This commit defines ext4_zeroout_es() in extents.c and move
      EXT_MAX_BLOCKS into ext4.h in order not to include ext4_extents.h in
      indirect.c and ioctl.c.  Meanwhile we just need to include this file in
      extent_status.c when ES_AGGRESSIVE_TEST is defined.  Otherwise, this
      commit removes a duplicated declaration in trace/events/ext4.h.
      
      After applied this patch, we just need to include ext4_extents.h file
      in {super,migrate,move_extents,extents}.c, and it is easy for us to
      define a new extent disk layout.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d7b2a00c
  19. 01 7月, 2013 1 次提交
  20. 12 6月, 2013 1 次提交
    • T
      ext4: don't use EXT4_FREE_BLOCKS_FORGET unnecessarily · 981250ca
      Theodore Ts'o 提交于
      Commit 18888cf0: "ext4: speed up truncate/unlink by not using
      bforget() unless needed" removed the use of EXT4_FREE_BLOCKS_FORGET in
      the most important codepath for file systems using extents, but a
      similar optimization also can be done for file systems using indirect
      blocks, and for the two special cases in the ext4 extents code.
      
      Cc: Andrey Sidorov <qrxd43@motorola.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      981250ca
  21. 05 6月, 2013 2 次提交
  22. 08 5月, 2013 1 次提交
  23. 04 4月, 2013 4 次提交
    • T
      ext4: refactor truncate code · 819c4920
      Theodore Ts'o 提交于
      Move common code in ext4_ind_truncate() and ext4_ext_truncate() into
      ext4_truncate().  This saves over 60 lines of code.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      819c4920
    • T
      ext4: refactor punch hole code · 26a4c0c6
      Theodore Ts'o 提交于
      Move common code in ext4_ind_punch_hole() and ext4_ext_punch_hole()
      into ext4_punch_hole().  This saves over 150 lines of code.
      
      This also fixes a potential bug when the punch_hole() code is racing
      against indirect-to-extents or extents-to-indirect migation.  We are
      currently using i_mutex to protect against changes to the inode flag;
      specifically, the append-only, immutable, and extents inode flags.  So
      we need to take i_mutex before deciding whether to use the
      extents-specific or indirect-specific punch_hole code.
      
      Also, there was a missing call to ext4_inode_block_unlocked_dio() in
      the indirect punch codepath.  This was added in commit 02d262df
      to block DIO readers racing against the punch operation in the
      codepath for extent-mapped inodes, but it was missing for
      indirect-block mapped inodes.  One of the advantages of refactoring
      the code is that it makes such oversights much less likely.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      26a4c0c6
    • T
      ext4: fold ext4_alloc_blocks() in ext4_alloc_branch() · 781f143e
      Theodore Ts'o 提交于
      The older code was far more complicated than it needed to be because
      of how we spliced in the ext4's new multiblock allocator into ext3's
      indirect block code.  By folding ext4_alloc_blocks() into
      ext4_alloc_branch(), we make the code far more understable, shave off
      over 130 lines of code and half a kilobyte of compiled object code.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      781f143e
    • Z
      ext4: fix big-endian bugs which could cause fs corruptions · 8cde7ad1
      Zheng Liu 提交于
      When an extent was zeroed out, we forgot to do convert from cpu to le16.
      It could make us hit a BUG_ON when we try to write dirty pages out.  So
      fix it.
      
      [ Also fix a bug found by Dmitry Monakhov where we were missing
        le32_to_cpu() calls in the new indirect punch hole code.
      
        There are a number of other big endian warnings found by static code
        analyzers, but we'll wait for the next merge window to fix them all
        up.  These fixes are designed to be Obviously Correct by code
        inspection, and easy to demonstrate that it won't make any
        difference (and hence, won't introduce any bugs) on little endian
        architectures such as x86.  --tytso ]
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NCAI Qian <caiqian@redhat.com>
      Reported-by: NChristian Kujau <lists@nerdbynature.de>
      Cc: Dmitry Monakhov <dmonakhov@openvz.org>
      8cde7ad1
  24. 28 2月, 2013 1 次提交
  25. 09 2月, 2013 1 次提交
    • T
      ext4: pass context information to jbd2__journal_start() · 9924a92a
      Theodore Ts'o 提交于
      So we can better understand what bits of ext4 are responsible for
      long-running jbd2 handles, use jbd2__journal_start() so we can pass
      context information for logging purposes.
      
      The recommended way for finding the longer-running handles is:
      
         T=/sys/kernel/debug/tracing
         EVENT=$T/events/jbd2/jbd2_handle_stats
         echo "interval > 5" > $EVENT/filter
         echo 1 > $EVENT/enable
      
         ./run-my-fs-benchmark
      
         cat $T/trace > /tmp/problem-handles
      
      This will list handles that were active for longer than 20ms.  Having
      longer-running handles is bad, because a commit started at the wrong
      time could stall for those 20+ milliseconds, which could delay an
      fsync() or an O_SYNC operation.  Here is an example line from the
      trace file describing a handle which lived on for 311 jiffies, or over
      1.2 seconds:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9924a92a
  26. 02 2月, 2013 1 次提交
  27. 28 1月, 2013 1 次提交
    • Z
      ext4: add punching hole support for non-extent-mapped files · 8bad6fc8
      Zheng Liu 提交于
      This patch add supports for indirect file support punching hole.  It
      is almost the same as ext4_ext_punch_hole.  First, we invalidate all
      pages between this hole, and then we try to deallocate all blocks of
      this hole.
      
      A recursive function is used to handle deallocation of blocks.  In
      this function, it iterates over the entries in inode's i_blocks or
      indirect blocks, and try to free the block for each one of them.
      
      After applying this patch, xfstest #255 will not pass w/o extent because
      indirect-based file doesn't support unwritten extents.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8bad6fc8
  28. 13 1月, 2013 1 次提交
  29. 29 11月, 2012 1 次提交
    • T
      ext4: rationalize ext4_extents.h inclusion · 4a092d73
      Theodore Ts'o 提交于
      Previously, ext4_extents.h was being included at the end of ext4.h,
      which was bad for a number of reasons: (a) it was not being included
      in the expected place, and (b) it caused the header to be included
      multiple times.  There were #ifdef's to prevent this from causing any
      problems, but it still was unnecessary.
      
      By moving the function declarations that were in ext4_extents.h to
      ext4.h, which is standard practice for where the function declarations
      for the rest of ext4.h can be found, we can remove ext4_extents.h from
      being included in ext4.h at all, and then we can only include
      ext4_extents.h where it is needed in ext4's source files.
      
      It should be possible to move a few more things into ext4.h, and
      further reduce the number of source files that need to #include
      ext4_extents.h, but that's a cleanup for another day.
      Reported-by: NSachin Kamat <sachin.kamat@linaro.org>
      Reported-by: NWei Yongjun <weiyj.lk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4a092d73