1. 01 7月, 2013 4 次提交
    • J
      ext4: reduce object size when !CONFIG_PRINTK · e7c96e8e
      Joe Perches 提交于
      Reduce the object size ~10% could be useful for embedded systems.
      
      Add #ifdef CONFIG_PRINTK #else #endif blocks to hold formats and
      arguments, passing " " to functions when !CONFIG_PRINTK and still
      verifying format and arguments with no_printk.
      
      $ size fs/ext4/built-in.o*
         text	   data	    bss	    dec	    hex	filename
       239375	    610	    888	 240873	  3ace9	fs/ext4/built-in.o.new
       264167	    738	    888	 265793	  40e41	fs/ext4/built-in.o.old
      
          $ grep -E "CONFIG_EXT4|CONFIG_PRINTK" .config
          # CONFIG_PRINTK is not set
          CONFIG_EXT4_FS=y
          CONFIG_EXT4_USE_FOR_EXT23=y
          CONFIG_EXT4_FS_POSIX_ACL=y
          # CONFIG_EXT4_FS_SECURITY is not set
          # CONFIG_EXT4_DEBUG is not set
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e7c96e8e
    • Z
      ext4: improve extent cache shrink mechanism to avoid to burn CPU time · d3922a77
      Zheng Liu 提交于
      Now we maintain an proper in-order LRU list in ext4 to reclaim entries
      from extent status tree when we are under heavy memory pressure.  For
      keeping this order, a spin lock is used to protect this list.  But this
      lock burns a lot of CPU time.  We can use the following steps to trigger
      it.
      
        % cd /dev/shm
        % dd if=/dev/zero of=ext4-img bs=1M count=2k
        % mkfs.ext4 ext4-img
        % mount -t ext4 -o loop ext4-img /mnt
        % cd /mnt
        % for ((i=0;i<160;i++)); do truncate -s 64g $i; done
        % for ((i=0;i<160;i++)); do cp $i /dev/null &; done
        % perf record -a -g
        % perf report
      
      This commit tries to fix this problem.  Now a new member called
      i_touch_when is added into ext4_inode_info to record the last access
      time for an inode.  Meanwhile we never need to keep a proper in-order
      LRU list.  So this can avoid to burns some CPU time.  When we try to
      reclaim some entries from extent status tree, we use list_sort() to get
      a proper in-order list.  Then we traverse this list to discard some
      entries.  In ext4_sb_info, we use s_es_last_sorted to record the last
      time of sorting this list.  When we traverse the list, we skip the inode
      that is newer than this time, and move this inode to the tail of LRU
      list.  When the head of the list is newer than s_es_last_sorted, we will
      sort the LRU list again.
      
      In this commit, we break the loop if s_extent_cache_cnt == 0 because
      that means that all extents in extent status tree have been reclaimed.
      
      Meanwhile in this commit, ext4_es_{un}register_shrinker()'s prototype is
      changed to save a local variable in these functions.
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d3922a77
    • A
      ext4: implement error handling of ext4_mb_new_preallocation() · 2c00ef3e
      Alexey Khoroshilov 提交于
      If memory allocation in ext4_mb_new_group_pa() is failed,
      it returns error code, ext4_mb_new_preallocation() propages it,
      but ext4_mb_new_blocks() ignores it.
      
      An observed result was:
      
      - allocation fail means ext4_mb_new_group_pa() does not update
        ext4_allocation_context;
      
      - ext4_mb_new_blocks() sets ext4_allocation_request->len (ar->len =
        ac->ac_b_ex.fe_len;) to number of blocks preallocated (512) instead
        of number of blocks requested (1);
      
      - that activates update cycle in ext4_splice_branch():
          for (i = 1; i < blks; i++) <-- blks is 512 instead of 1 here
            *(where->p + i) = cpu_to_le32(current_block++);
      
      - it iterates 511 times and corrupts a chunk of memory including inode
        structure;
      
      - page fault happens at EXT4_SB(inode->i_sb) in ext4_mark_inode_dirty();
      
      - system hangs with 'scheduling while atomic' BUG.
      
      The patch implements a check for ext4_mb_new_preallocation() error
      code and handles its failure as if ext4_mb_regular_allocator() fails.
      
      Found by Linux File System Verification project (linuxtesting.org).
      
      [ Patch restructed by tytso to make the flow of control easier to follow. ]
      Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2c00ef3e
    • M
      ext4: fix corruption when online resizing a fs with 1K block size · 6ca792ed
      Maarten ter Huurne 提交于
      Subtracting the number of the first data block places the superblock
      backups one block too early, corrupting the file system. When the block
      size is larger than 1K, the first data block is 0, so the subtraction
      has no effect and no corruption occurs.
      Signed-off-by: NMaarten ter Huurne <maarten@treewalker.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      CC: stable@vger.kernel.org
      6ca792ed
  2. 17 6月, 2013 1 次提交
  3. 13 6月, 2013 3 次提交
  4. 12 6月, 2013 2 次提交
    • T
      ext4: don't use EXT4_FREE_BLOCKS_FORGET unnecessarily · 981250ca
      Theodore Ts'o 提交于
      Commit 18888cf0: "ext4: speed up truncate/unlink by not using
      bforget() unless needed" removed the use of EXT4_FREE_BLOCKS_FORGET in
      the most important codepath for file systems using extents, but a
      similar optimization also can be done for file systems using indirect
      blocks, and for the two special cases in the ext4 extents code.
      
      Cc: Andrey Sidorov <qrxd43@motorola.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      981250ca
    • T
      ext4: add cond_resched() to ext4_free_blocks() & ext4_mb_regular_allocator() · 2ed5724d
      Theodore Ts'o 提交于
      For a file systems with a very large number of block groups, if all of
      the block group bitmaps are in memory and the file system is
      relatively badly fragmented, it's possible ext4_mb_regular_allocator()
      to take a long time trying to find a good match.  This is especially
      true if the tuning parameter mb_max_to_scan has been sent to a very
      large number.  So add a cond_resched() to avoid soft lockup warnings
      and to provide better system responsiveness.
      
      For ext4_free_blocks(), if we are deleting a large range of blocks,
      and data=journal is enabled so that EXT4_FREE_BLOCKS_FORGET is passed,
      the loop to call sb_find_get_block() and to call ext4_forget() can
      take over 10-15 milliseocnds or more.  So it's better to add a
      cond_resched() here a well.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      
      
      2ed5724d
  5. 07 6月, 2013 1 次提交
    • T
      ext4: use ext4_da_writepages() for all modes · 20970ba6
      Theodore Ts'o 提交于
      Rename ext4_da_writepages() to ext4_writepages() and use it for all
      modes.  We still need to iterate over all the pages in the case of
      data=journalling, but in the case of nodelalloc/data=ordered (which is
      what file systems mounted using ext3 backwards compatibility will use)
      this will allow us to use a much more efficient I/O submission path.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      20970ba6
  6. 06 6月, 2013 4 次提交
  7. 05 6月, 2013 18 次提交
  8. 04 6月, 2013 1 次提交
    • J
      ext4: use io_end for multiple bios · 97a851ed
      Jan Kara 提交于
      Change writeback path to create just one io_end structure for the
      extent to which we submit IO and share it among bios writing that
      extent. This prevents needless splitting and joining of unwritten
      extents when they cannot be submitted as a single bio.
      
      Bugs in ENOMEM handling found by Linux File System Verification project
      (linuxtesting.org) and fixed by Alexey Khoroshilov
      <khoroshilov@ispras.ru>.
      
      CC: Alexey Khoroshilov <khoroshilov@ispras.ru>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      97a851ed
  9. 01 6月, 2013 4 次提交
  10. 28 5月, 2013 2 次提交
    • P
      ext4: suppress ext4 orphan messages on mount · 566370a2
      Paul Taysom 提交于
      Suppress the messages releating to processing the ext4 orphan list
      ("truncating inode" and "deleting unreferenced inode") unless the
      debug option is on, since otherwise they end up taking up space in the
      log that could be used for more useful information.
      
      Tested by opening several files, unlinking them, then
      crashing the system, rebooting the system and examining
      /var/log/messages.
      
      Addresses the problem described in http://crbug.com/220976Signed-off-by: NPaul Taysom <taysom@chromium.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      566370a2
    • L
      ext4: make punch hole code path work with bigalloc · d23142c6
      Lukas Czerner 提交于
      Currently punch hole is disabled in file systems with bigalloc
      feature enabled. However the recent changes in punch hole patch should
      make it easier to support punching holes on bigalloc enabled file
      systems.
      
      This commit changes partial_cluster handling in ext4_remove_blocks(),
      ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
      partial_cluster is unsigned long long type and it makes sure that we
      will free the partial cluster if all extents has been released from that
      cluster. However it has been specifically designed only for truncate.
      
      With punch hole we can be freeing just some extents in the cluster
      leaving the rest untouched. So we have to make sure that we will notice
      cluster which still has some extents. To do this I've changed
      partial_cluster to be signed long long type. The only scenario where
      this could be a problem is when cluster_size == block size, however in
      that case there would not be any partial clusters so we're safe. For
      bigger clusters the signed type is enough. Now we use the negative value
      in partial_cluster to mark such cluster used, hence we know that we must
      not free it even if all other extents has been freed from such cluster.
      
      This scenario can be described in simple diagram:
      
      |FFF...FF..FF.UUU|
       ^----------^
        punch hole
      
      . - free space
      | - cluster boundary
      F - freed extent
      U - used extent
      
      Also update respective tracepoints to use signed long long type for
      partial_cluster.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d23142c6