1. 18 9月, 2014 1 次提交
    • J
      jbd2: avoid pointless scanning of checkpoint lists · cc97f1a7
      Jan Kara 提交于
      Yuanhan has reported that when he is running fsync(2) heavy workload
      creating new files over ramdisk, significant amount of time is spent in
      __jbd2_journal_clean_checkpoint_list() trying to clean old transactions
      (but they cannot be cleaned up because flusher hasn't yet checkpointed
      those buffers). The workload can be generated by:
        fs_mark -d /fs/ram0/1 -D 2 -N 2560 -n 1000000 -L 1 -S 1 -s 4096
      
      Reduce the amount of scanning by stopping to scan the transaction list
      once we find a transaction that cannot be checkpointed. Note that this
      way of cleaning is still enough to keep freeing space in the journal
      after fully checkpointed transactions.
      Reported-and-tested-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      cc97f1a7
  2. 17 9月, 2014 4 次提交
    • D
      ext4: explicitly inform user about orphan list cleanup · 84474976
      Dmitry Monakhov 提交于
      Production fs likely compiled/mounted w/o jbd debugging, so orphan
      list clearing will be silent.
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      84474976
    • D
      jbd2: jbd2_log_wait_for_space improve error detetcion · 1245799f
      Dmitry Monakhov 提交于
      If EIO happens after we have dropped j_state_lock, we won't notice
      that the journal has been aborted.  So it is reasonable to move this
      check after we have grabbed the j_checkpoint_mutex and re-grabbed the
      j_state_lock.  This patch helps to prevent false positive complain
      after EIO.
      
      #DMESG:
      __jbd2_log_wait_for_space: needed 8448 blocks and only had 8386 space available
      __jbd2_log_wait_for_space: no way to get more journal space in ram1-8
      ------------[ cut here ]------------
      WARNING: CPU: 15 PID: 6739 at fs/jbd2/checkpoint.c:168 __jbd2_log_wait_for_space+0x188/0x200()
      Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
      CPU: 15 PID: 6739 Comm: fsstress Tainted: G        W      3.17.0-rc2-00429-g684de574 #139
      Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
       00000000000000a8 ffff88077aaab878 ffffffff815c1a8c 00000000000000a8
       0000000000000000 ffff88077aaab8b8 ffffffff8106ce8c ffff88077aaab898
       ffff8807c57e6000 ffff8807c57e6028 0000000000002100 ffff8807c57e62f0
      Call Trace:
       [<ffffffff815c1a8c>] dump_stack+0x51/0x6d
       [<ffffffff8106ce8c>] warn_slowpath_common+0x8c/0xc0
       [<ffffffff8106ceda>] warn_slowpath_null+0x1a/0x20
       [<ffffffff812419f8>] __jbd2_log_wait_for_space+0x188/0x200
       [<ffffffff8123be9a>] start_this_handle+0x4da/0x7b0
       [<ffffffff810990e5>] ? local_clock+0x25/0x30
       [<ffffffff810aba87>] ? lockdep_init_map+0xe7/0x180
       [<ffffffff8123c5bc>] jbd2__journal_start+0xdc/0x1d0
       [<ffffffff811f2414>] ? __ext4_new_inode+0x7f4/0x1330
       [<ffffffff81222a38>] __ext4_journal_start_sb+0xf8/0x110
       [<ffffffff811f2414>] __ext4_new_inode+0x7f4/0x1330
       [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
       [<ffffffff812025bb>] ext4_create+0x8b/0x150
       [<ffffffff8117fe3b>] vfs_create+0x7b/0xb0
       [<ffffffff8118097b>] do_last+0x7db/0xcf0
       [<ffffffff8117e31d>] ? inode_permission+0x4d/0x50
       [<ffffffff811845d2>] path_openat+0x242/0x590
       [<ffffffff81191a76>] ? __alloc_fd+0x36/0x140
       [<ffffffff81184a6a>] do_filp_open+0x4a/0xb0
       [<ffffffff81191b61>] ? __alloc_fd+0x121/0x140
       [<ffffffff81172f20>] do_sys_open+0x170/0x220
       [<ffffffff8117300e>] SyS_open+0x1e/0x20
       [<ffffffff811715d6>] SyS_creat+0x16/0x20
       [<ffffffff815c7e12>] system_call_fastpath+0x16/0x1b
      ---[ end trace cd71c831f82059db ]---
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1245799f
    • D
      jbd2: free bh when descriptor block checksum fails · 064d8389
      Darrick J. Wong 提交于
      Free the buffer head if the journal descriptor block fails checksum
      verification.
      
      This is the jbd2 port of the e2fsprogs patch "e2fsck: free bh on csum
      verify error in do_one_pass".
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Cc: stable@vger.kernel.org
      064d8389
    • D
      ext4: check EA value offset when loading · a0626e75
      Darrick J. Wong 提交于
      When loading extended attributes, check each entry's value offset to
      make sure it doesn't collide with the entries.
      
      Without this check it is easy to crash the kernel by mounting a
      malicious FS containing a file with an EA wherein e_value_offs = 0 and
      e_value_size > 0 and then deleting the EA, which corrupts the name
      list.
      
      (See the f_ea_value_crash test's FS image in e2fsprogs for an example.)
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      a0626e75
  3. 11 9月, 2014 6 次提交
  4. 05 9月, 2014 9 次提交
  5. 02 9月, 2014 18 次提交
    • Z
      ext4: track extent status tree shrinker delay statictics · eb68d0e2
      Zheng Liu 提交于
      This commit adds some statictics in extent status tree shrinker.  The
      purpose to add these is that we want to collect more details when we
      encounter a stall caused by extent status tree shrinker.  Here we count
      the following statictics:
        stats:
          the number of all objects on all extent status trees
          the number of reclaimable objects on lru list
          cache hits/misses
          the last sorted interval
          the number of inodes on lru list
        average:
          scan time for shrinking some objects
          the number of shrunk objects
        maximum:
          the inode that has max nr. of objects on lru list
          the maximum scan time for shrinking some objects
      
      The output looks like below:
        $ cat /proc/fs/ext4/sda1/es_shrinker_info
        stats:
          28228 objects
          6341 reclaimable objects
          5281/631 cache hits/misses
          586 ms last sorted interval
          250 inodes on lru list
        average:
          153 us scan time
          128 shrunk objects
        maximum:
          255 inode (255 objects, 198 reclaimable)
          125723 us max scan time
      
      If the lru list has never been sorted, the following line will not be
      printed:
          586ms last sorted interval
      If there is an empty lru list, the following lines also will not be
      printed:
          250 inodes on lru list
        ...
        maximum:
          255 inode (255 objects, 198 reclaimable)
          0 us max scan time
      
      Meanwhile in this commit a new trace point is defined to print some
      details in __ext4_es_shrink().
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      eb68d0e2
    • Z
      ext4: improve extents status tree trace point · e963bb1d
      Zheng Liu 提交于
      This commit improves the trace point of extents status tree.  We rename
      trace_ext4_es_shrink_enter in ext4_es_count() because it is also used
      in ext4_es_scan() and we can not identify them from the result.
      
      Further this commit fixes a variable name in trace point in order to
      keep consistency with others.
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e963bb1d
    • S
      ext4: fix comments about get_blocks · d91bd2c1
      Seunghun Lee 提交于
      get_blocks is renamed to get_block.
      Signed-off-by: NSeunghun Lee <waydi1@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d91bd2c1
    • D
      ext4: enable block_validity by default · 45f1a9c3
      Darrick J. Wong 提交于
      Enable by default the block_validity feature, which checks for
      collisions between newly allocated blocks and critical system
      metadata.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      45f1a9c3
    • T
      jbd2: fold __wait_cp_io into jbd2_log_do_checkpoint() · 88fe1acb
      Theodore Ts'o 提交于
      __wait_cp_io() is only called by jbd2_log_do_checkpoint().  Fold it in
      to make it a bit easier to understand.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      88fe1acb
    • T
      jbd2: fold __process_buffer() into jbd2_log_do_checkpoint() · be1158cc
      Theodore Ts'o 提交于
      __process_buffer() is only called by jbd2_log_do_checkpoint(), and it
      had a very complex locking protocol where it would be called with the
      j_list_lock, and sometimes exit with the lock held (if the return code
      was 0), or release the lock.
      
      This was confusing both to humans and to smatch (which erronously
      complained that the lock was taken twice).
      
      Folding __process_buffer() to the caller allows us to simplify the
      control flow, making the resulting function easier to read and reason
      about, and dropping the compiled size of fs/jbd2/checkpoint.c by 150
      bytes (over 4% of the text size).
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      be1158cc
    • T
      ext4: rename ext4_ext_find_extent() to ext4_find_extent() · ed8a1a76
      Theodore Ts'o 提交于
      Make the function name less redundant.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ed8a1a76
    • T
      ext4: reuse path object in ext4_move_extents() · 3bdf14b4
      Theodore Ts'o 提交于
      Reuse the path object in ext4_move_extents() so we don't unnecessarily
      free and reallocate it.
      
      Also clean up the get_ext_path() wrapper so that it has the same
      semantics of freeing the path object on error as ext4_ext_find_extent().
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      3bdf14b4
    • T
      ext4: reuse path object in ext4_ext_shift_extents() · ee4bd0d9
      Theodore Ts'o 提交于
      Now that the semantics of ext4_ext_find_extent() are much cleaner,
      it's safe and more efficient to reuse the path object across the
      multiple calls to ext4_ext_find_extent() in ext4_ext_shift_extents().
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ee4bd0d9
    • T
      ext4: teach ext4_ext_find_extent() to realloc path if necessary · 10809df8
      Theodore Ts'o 提交于
      This adds additional safety in case for some reason we end reusing a
      path structure which isn't big enough for current depth of the inode.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      10809df8
    • T
      ext4: allow a NULL argument to ext4_ext_drop_refs() · b7ea89ad
      Theodore Ts'o 提交于
      Teach ext4_ext_drop_refs() to accept a NULL argument, much like
      kfree().  This allows us to drop a lot of checks to make sure path is
      non-NULL before calling ext4_ext_drop_refs().
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b7ea89ad
    • T
      ext4: call ext4_ext_drop_refs() from ext4_ext_find_extent() · 523f431c
      Theodore Ts'o 提交于
      In nearly all of the calls to ext4_ext_find_extent() where the caller
      is trying to recycle the path object, ext4_ext_drop_refs() gets called
      to release the buffer heads before the path object gets overwritten.
      To simplify things for the callers, and to avoid the possibility of a
      memory leak, make ext4_ext_find_extent() responsible for dropping the
      buffers.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      523f431c
    • T
      ext4: drop EXT4_EX_NOFREE_ON_ERR from rest of extents handling code · dfe50809
      Theodore Ts'o 提交于
      Drop EXT4_EX_NOFREE_ON_ERR from ext4_ext_create_new_leaf(),
      ext4_split_extent(), ext4_convert_unwritten_extents_endio().
      
      This requires fixing all of their callers to potentially
      ext4_ext_find_extent() to free the struct ext4_ext_path object in case
      of an error, and there are interlocking dependencies all the way up to
      ext4_ext_map_blocks(), ext4_swap_extents(), and
      ext4_ext_remove_space().
      
      Once this is done, we can drop the EXT4_EX_NOFREE_ON_ERR flag since it
      is no longer necessary.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      dfe50809
    • T
      ext4: drop EXT4_EX_NOFREE_ON_ERR in convert_initialized_extent() · 4f224b8b
      Theodore Ts'o 提交于
      Transfer responsibility of freeing struct ext4_ext_path on error to
      ext4_ext_find_extent().
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      4f224b8b
    • T
      ext4: collapse ext4_convert_initialized_extents() · e8b83d93
      Theodore Ts'o 提交于
      The function ext4_convert_initialized_extents() is only called by a
      single function --- ext4_ext_convert_initalized_extents().  Inline the
      code and get rid of the unnecessary bits in order to simplify the code.
      
      Rename ext4_ext_convert_initalized_extents() to
      convert_initalized_extents() since it's a static function that is
      actually only used in a single caller, ext4_ext_map_blocks().
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e8b83d93
    • T
      ext4: teach ext4_ext_find_extent() to free path on error · 705912ca
      Theodore Ts'o 提交于
      Right now, there are a places where it is all to easy to leak memory
      on an error path, via a usage like this:
      
      	struct ext4_ext_path *path = NULL
      
      	while (...) {
      		...
      		path = ext4_ext_find_extent(inode, block, path, 0);
      		if (IS_ERR(path)) {
      			/* oops, if path was non-NULL before the call to
      			   ext4_ext_find_extent, we've leaked it!  :-(  */
      			...
      			return PTR_ERR(path);
      		}
      		...
      	}
      
      Unfortunately, there some code paths where we are doing the following
      instead:
      
      	path = ext4_ext_find_extent(inode, block, orig_path, 0);
      
      and where it's important that we _not_ free orig_path in the case
      where ext4_ext_find_extent() returns an error.
      
      So change the function signature of ext4_ext_find_extent() so that it
      takes a struct ext4_ext_path ** for its third argument, and by
      default, on an error, it will free the struct ext4_ext_path, and then
      zero out the struct ext4_ext_path * pointer.  In order to avoid
      causing problems, we add a flag EXT4_EX_NOFREE_ON_ERR which causes
      ext4_ext_find_extent() to use the original behavior of forcing the
      caller to deal with freeing the original path pointer on the error
      case.
      
      The goal is to get rid of EXT4_EX_NOFREE_ON_ERR entirely, but this
      allows for a gentle transition and makes the patches easier to verify.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      
      		
      705912ca
    • T
      ext4: fix accidental flag aliasing in ext4_map_blocks flags · bd30d702
      Theodore Ts'o 提交于
      Commit b8a86845 introduced an accidental flag aliasing between
      EXT4_EX_NOCACHE and EXT4_GET_BLOCKS_CONVERT_UNWRITTEN.
      
      Fortunately, this didn't introduce any untorward side effects --- we
      got lucky.  Nevertheless, fix this and leave a warning to hopefully
      avoid this from happening in the future.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      bd30d702
    • T
      ext4: fix ZERO_RANGE bug hidden by flag aliasing · 713e8dde
      Theodore Ts'o 提交于
      We accidently aliased EXT4_EX_NOCACHE and EXT4_GET_CONVERT_UNWRITTEN
      falgs, which apparently was hiding a bug that was unmasked when this
      flag aliasing issue was addressed (see the subsequent commit).  The
      reproduction case was:
      
         fsx -N 10000 -l 500000 -r 4096 -t 4096 -w 4096 -Z -R -W /vdb/junk
      
      ... which would cause fsx to report corruption in the data file.
      
      The fix we have is a bit of an overkill, but I'd much rather be
      conservative for now, and we can optimize ZERO_RANGE_FL handling
      later.  The fact that we need to zap the extent_status cache for the
      inode is unfortunate, but correctness is far more important than
      performance.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      713e8dde
  6. 01 9月, 2014 1 次提交
  7. 31 8月, 2014 1 次提交
    • D
      ext4: refactor ext4_move_extents code base · fcf6b1b7
      Dmitry Monakhov 提交于
      ext4_move_extents is too complex for review. It has duplicate almost
      each function available in the rest of other codebase. It has useless
      artificial restriction orig_offset == donor_offset. But in fact logic
      of ext4_move_extents is very simple:
      
      Iterate extents one by one (similar to ext4_fill_fiemap_extents)
         ->Iterate each page covered extent (similar to generic_perform_write)
           ->swap extents for covered by page (can be shared with IOC_MOVE_DATA)
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      fcf6b1b7