1. 17 8月, 2012 8 次提交
    • T
      ext4: check return value of blkdev_issue_flush() · a4a39040
      Theodore Ts'o 提交于
          
      blkdev_issue_flush() can fail; make sure the error gets properly
      propagated.
          
      This is a port of the equivalent ext3 patch from commit 44f4f729.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a4a39040
    • Z
      ext4: make the zero-out chunk size tunable · 67a5da56
      Zheng Liu 提交于
      Currently in ext4 the length of zero-out chunk is set to 7 file system
      blocks.  But if an inode has uninitailized extents from using
      fallocate to preallocate space, and the workload issues many random
      writes, this can cause a fragmented extent tree that will
      unnecessarily grow the extent tree.
      
      So create a new sysfs tunable, extent_max_zeroout_kb, which controls
      the maximum size where blocks will be zeroed out instead of creating a
      new uninitialized extent.  The default of this has been sent to 32kb.
      
      CC: Zach Brown <zab@zabbo.net>
      CC: Andreas Dilger <adilger@dilger.ca>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      67a5da56
    • T
      ext4: add max_dir_size_kb mount option · df981d03
      Theodore Ts'o 提交于
      Very large directories can cause significant performance problems, or
      perhaps even invoke the OOM killer, if the process is running in a
      highly constrained memory environment (whether it is VM's with a small
      amount of memory or in a small memory cgroup).
      
      So it is useful, in cloud server/data center environments, to be able
      to set a filesystem-wide cap on the maximum size of a directory, to
      ensure that directories never get larger than a sane size.  We do this
      via a new mount option, max_dir_size_kb.  If there is an attempt to
      grow the directory larger than max_dir_size_kb, the system call will
      return ENOSPC instead.
      
      Google-Bug-Id: 6863013
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      
      
      
      df981d03
    • T
      ext4: don't load the block bitmap for block groups which have no space · 01fc48e8
      Theodore Ts'o 提交于
      Add a short circuit check to ext4_mb_group_group() so that we don't
      bother to load the block bitmap for a block group which does not have
      any space available.  (Or which does not have enough space until we
      are in desperation mode, i.e., when cr == 3.)
      
      Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=45741
      Reported-by: mirek@me.com
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      01fc48e8
    • T
      ext4: collapse a single extent tree block into the inode if possible · ecb94f5f
      Theodore Ts'o 提交于
      If an inode has more than 4 extents, but then later some of the
      extents are merged together, we can optimize the file system by moving
      the extents up into the inode, and discarding the extent tree block.
      This is important, because if there are a large number of inodes with
      an external extent tree blocks where the contents could fit in the
      inode, this can significantly increase the fsck time of the file
      system.
      
      Google-Bug-Id: 6801242
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ecb94f5f
    • T
      ext4: fix kernel BUG on large-scale rm -rf commands · 89a4e48f
      Theodore Ts'o 提交于
      Commit 968dee77: "ext4: fix hole punch failure when depth is greater
      than 0" introduced a regression in v3.5.1/v3.6-rc1 which caused kernel
      crashes when users ran run "rm -rf" on large directory hierarchy on
      ext4 filesystems on RAID devices:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
      
          Process rm (pid: 18229, threadinfo ffff8801276bc000, task ffff880123631710)
          Call Trace:
           [<ffffffff81236483>] ? __ext4_handle_dirty_metadata+0x83/0x110
           [<ffffffff812353d3>] ext4_ext_truncate+0x193/0x1d0
           [<ffffffff8120a8cf>] ? ext4_mark_inode_dirty+0x7f/0x1f0
           [<ffffffff81207e05>] ext4_truncate+0xf5/0x100
           [<ffffffff8120cd51>] ext4_evict_inode+0x461/0x490
           [<ffffffff811a1312>] evict+0xa2/0x1a0
           [<ffffffff811a1513>] iput+0x103/0x1f0
           [<ffffffff81196d84>] do_unlinkat+0x154/0x1c0
           [<ffffffff8118cc3a>] ? sys_newfstatat+0x2a/0x40
           [<ffffffff81197b0b>] sys_unlinkat+0x1b/0x50
           [<ffffffff816135e9>] system_call_fastpath+0x16/0x1b
          Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00
      
          RIP  [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0
      
      This could be reproduced as follows:
      
      The problem in commit 968dee77 was that caused the variable 'i' to
      be left uninitialized if the truncate required more space than was
      available in the journal.  This resulted in the function
      ext4_ext_truncate_extend_restart() returning -EAGAIN, which caused
      ext4_ext_remove_space() to restart the truncate operation after
      starting a new jbd2 handle.
      Reported-by: NMaciej Żenczykowski <maze@google.com>
      Reported-by: NMarti Raudsepp <marti@juffo.org>
      Tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      89a4e48f
    • T
      ext4: fix long mount times on very big file systems · 0548bbb8
      Theodore Ts'o 提交于
      Commit 8aeb00ff85a: "ext4: fix overhead calculation used by
      ext4_statfs()" introduced a O(n**2) calculation which makes very large
      file systems take forever to mount.  Fix this with an optimization for
      non-bigalloc file systems.  (For bigalloc file systems the overhead
      needs to be set in the the superblock.)
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      0548bbb8
    • T
      ext4: don't call ext4_error while block group is locked · 7a4c5de2
      Theodore Ts'o 提交于
      While in ext4_validate_block_bitmap(), if an block allocation bitmap
      is found to be invalid, we call ext4_error() while the block group is
      still locked.  This causes ext4_commit_super() to call a function
      which might sleep while in an atomic context.
      
      There's no need to keep the block group locked at this point, so hoist
      the ext4_error() call up to ext4_validate_block_bitmap() and release
      the block group spinlock before calling ext4_error().
      
      The reported stack trace can be found at:
      
      	http://article.gmane.org/gmane.comp.file-systems.ext4/33731Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      7a4c5de2
  2. 06 8月, 2012 2 次提交
    • T
      ext4: avoid kmemcheck complaint from reading uninitialized memory · 7e731bc9
      Theodore Ts'o 提交于
      Commit 03179fe9 introduced a kmemcheck complaint in
      ext4_da_get_block_prep() because we save and restore
      ei->i_da_metadata_calc_last_lblock even though it is left
      uninitialized in the case where i_da_metadata_calc_len is zero.
      
      This doesn't hurt anything, but silencing the kmemcheck complaint
      makes it easier for people to find real bugs.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631
      (which is marked as a regression).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      7e731bc9
    • T
      ext4: make sure the journal sb is written in ext4_clear_journal_err() · d796c52e
      Theodore Ts'o 提交于
      After we transfer set the EXT4_ERROR_FS bit in the file system
      superblock, it's not enough to call jbd2_journal_clear_err() to clear
      the error indication from journal superblock --- we need to call
      jbd2_journal_update_sb_errno() as well.  Otherwise, when the root file
      system is mounted read-only, the journal is replayed, and the error
      indicator is transferred to the superblock --- but the s_errno field
      in the jbd2 superblock is left set (since although we cleared it in
      memory, we never flushed it out to disk).
      
      This can end up confusing e2fsck.  We should make e2fsck more robust
      in this case, but the kernel shouldn't be leaving things in this
      confused state, either.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      
      d796c52e
  3. 31 7月, 2012 2 次提交
  4. 23 7月, 2012 18 次提交
  5. 18 7月, 2012 1 次提交
  6. 14 7月, 2012 4 次提交
  7. 10 7月, 2012 5 次提交