1. 18 7月, 2011 3 次提交
  2. 17 7月, 2011 1 次提交
  3. 12 7月, 2011 4 次提交
  4. 11 7月, 2011 9 次提交
    • R
      ext4: remove redundant goto in ext4_ext_insert_extent() · ffb505ff
      Robin Dong 提交于
      If eh->eh_entries is smaller than eh->eh_max, the routine will
      go to the "repeat" and then go to "has_space" directlly ,
      since argument "depth" and "eh" are not even changed.
      
      Therefore, goto "has_space" directly and remove redundant "repeat" tag.
      Signed-off-by: NRobin Dong <sanbai@taobao.com>
      ffb505ff
    • T
      ext4: Change the wrong param comment for ext4_trim_all_free · 22612283
      Tao Ma 提交于
      at ext4_trim_all_free() comment, there is no longer an @e4b parameter,
      instead it is @group.
      Reported-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      22612283
    • T
      ext4: Speed up FITRIM by recording flags in ext4_group_info · 3d56b8d2
      Tao Ma 提交于
      In ext4, when FITRIM is called every time, we iterate all the
      groups and do trim one by one. It is a bit time wasting if the
      group has been trimmed and there is no change since the last
      trim.
      
      So this patch adds a new flag in ext4_group_info->bb_state to
      indicate that the group has been trimmed, and it will be cleared
      if some blocks is freed(in release_blocks_on_commit). Another
      trim_minlen is added in ext4_sb_info to record the last minlen
      we use to trim the volume, so that if the caller provide a small
      one, we will go on the trim regardless of the bb_state.
      
      A simple test with my intel x25m ssd:
      df -h shows:
      /dev/sdb1              40G   21G   17G  56% /mnt/ext4
      Block size:               4096
      
      run the FITRIM with the following parameter:
      range.start = 0;
      range.len = UINT64_MAX;
      range.minlen = 1048576;
      
      without the patch:
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.505s
      user	0m0.000s
      sys	0m1.224s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.359s
      user	0m0.000s
      sys	0m1.178s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.228s
      user	0m0.000s
      sys	0m1.151s
      
      with the patch:
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.625s
      user	0m0.000s
      sys	0m1.269s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m0.002s
      user	0m0.000s
      sys	0m0.001s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m0.002s
      user	0m0.000s
      sys	0m0.001s
      
      A big improvement for the 2nd and 3rd run.
      
      Even after I delete some big image files, it is still much
      faster than iterating the whole disk.
      
      [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
      real	0m1.217s
      user	0m0.000s
      sys	0m0.196s
      
      Cc: Lukas Czerner <lczerner@redhat.com>
      Reviewed-by: NAndreas Dilger <adilger.kernel@dilger.ca>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      3d56b8d2
    • T
      ext4: Add new ext4 trim tracepoints · b3d4c2b1
      Tao Ma 提交于
      Add ext4_trim_extent and ext4_trim_all_free.
      Reviewed-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b3d4c2b1
    • T
      ext4: speed up group trim with the right free block count · 169ddc3e
      Tao Ma 提交于
      When we trim some free blocks in a group of ext4, we need to 
      calculate the free blocks properly and check whether there are
      enough freed blocks left for us to trim. Current solution will
      only calculate free spaces if they are large for a trim which
      isn't appropriate.
      
      Let us see a small example:
      a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
      And minblocks is 1M.  With current solution, we have to iterate
      the whole group since these 300k will never be subtracted from
      1.5M.  But actually we should exit after we find the first 2
      free spaces since the left 3 chunks only sum up to 900K if we
      subtract the first 600K although they can't be trimed.
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      169ddc3e
    • T
      ext4: fix trim length underflow with small trim length · 22f10457
      Tao Ma 提交于
      In 0f0a25bf, we adjust 'len' with s_first_data_block - start, but
      it could underflow in case blocksize=1K, fstrim_range.len=512 and
      fstrim_range.start = 0. In this case, when we run the code:
      len -= first_data_blk - start; len will be underflow to -1ULL.
      In the end, although we are safe that last_group check later will limit
      the trim to the whole volume, but that isn't what the user really want.
      
      So this patch fix it. It also adds the check for 'start' like ext3 so that
      we can break immediately if the start is invalid.
      
      Cc: Lukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      22f10457
    • T
      ext4: add tracepoint for ext4_journal_start · 12706394
      Theodore Ts'o 提交于
      This will help debug who is responsible for starting a jbd2 transaction.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      12706394
    • J
      ext4: free allocated and pre-allocated blocks when check_eofblocks_fl fails · 575a1d4b
      Jiaying Zhang 提交于
      Upon corrupted inode or disk failures, we may fail after we already
      allocate some blocks from the inode or take some blocks from the
      inode's preallocation list, but before we successfully insert the
      corresponding extent to the extent tree. In this case, we should free
      any allocated blocks and discard the inode's preallocated blocks
      because the entries in the inode's preallocation list may be in an
      inconsistent state.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      575a1d4b
    • M
      ext4: fix i_blocks/quota accounting when extent insertion fails · 7132de74
      Maxim Patlasov 提交于
      The current implementation of ext4_free_blocks() always calls
      dquot_free_block This looks quite sensible in the most cases: blocks
      to be freed are associated with inode and were accounted in quota and
      i_blocks some time ago.
      
      However, there is a case when blocks to free were not accounted by the
      time calling ext4_free_blocks() yet:
      
      1. delalloc is on, write_begin pre-allocated some space in quota
      2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks()
      3. then ext4_ext_map_blocks() gets an error (e.g.  ENOSPC) from
         ext4_ext_insert_extent() and calls ext4_free_blocks().
      
      In this scenario, ext4_free_blocks() calls dquot_free_block() who, in
      turn, decrements i_blocks for blocks which were not accounted yet (due
      to delalloc) After clean umount, e2fsck reports something like:
      
      > Inode 21, i_blocks is 5080, should be 5128.  Fix<y>?
      because i_blocks was erroneously decremented as explained above.
      
      The patch fixes the problem by passing the new flag
      EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request
      that the dquot_free_block() call be skipped.
      Signed-off-by: NMaxim Patlasov <maxim.patlasov@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      7132de74
  5. 30 6月, 2011 1 次提交
  6. 28 6月, 2011 8 次提交
  7. 06 6月, 2011 4 次提交
    • L
      ext4: fixed tracepoints cleanup · a9c667f8
      Lukas Czerner 提交于
      While creating fixed tracepoints for ext3, basically by porting them
      from ext4, I found a lot of useless retyping, wrong type usage, useless
      variable passing and other inconsistencies in the ext4 fixed tracepoint
      code.
      
      This patch cleans the fixed tracepoint code for ext4 and also simplify
      some of them.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a9c667f8
    • L
      ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap · c03f8aa9
      Lukas Czerner 提交于
      Currently we are not marking the extent as the last one
      (FIEMAP_EXTENT_LAST) if there is a hole at the end of the file. This is
      because we just do not check for it right now and continue searching for
      next extent. But at the point we hit the hole at the end of the file, it
      is too late.
      
      This commit adds check for the allocated block in subsequent extent and
      if there is no more extents (block = EXT_MAX_BLOCKS) just flag the
      current one as the last one.
      
      This behaviour has been spotted unintentionally by 252 xfstest, when the
      test hangs out, because of wrong loop condition. However on other
      filesystems (like xfs) it will exit anyway, because we notice the last
      extent flag and exit.
      
      With this patch xfstest 252 does not hang anymore, ext4 fiemap
      implementation still reports bad extent type in some cases, however
      this seems to be different issue.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c03f8aa9
    • L
      ext4: Fix max file size and logical block counting of extent format file · f17722f9
      Lukas Czerner 提交于
      Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
      in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
      format and fill the tail of file up to its end. We will hit the BUG_ON
      when we write the last block (2^32-1) into the sparse file.
      
      The root cause of the problem lies in the fact that we specifically set
      s_maxbytes so that block at s_maxbytes fit into on-disk extent format,
      which is 32 bit long. However, we are not storing start and end block
      number, but rather start block number and length in blocks. It means
      that in order to cover extent from 0 to EXT_MAX_BLOCK we need
      EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) -
      and it does not.
      
      The only way to fix it without changing the meaning of the struct
      ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes
      by one fs block so we can cover the whole extent we can get by the
      on-disk extent format.
      
      Also in many places EXT_MAX_BLOCK is used as length instead of maximum
      logical block number as the name suggests, it is all a bit messy. So
      this commit renames it to EXT_MAX_BLOCKS and change its usage in some
      places to actually be maximum number of blocks in the extent.
      
      The bug which this commit fixes can be reproduced as follows:
      
       dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
       sync
       dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))
      Reported-by: NKazuya Mio <k-mio@sx.jp.nec.com>
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f17722f9
    • Y
      ext4: correct comments for ext4_free_blocks() · 5def1360
      Yongqiang Yang 提交于
      metadata is not parameter of ext4_free_blocks() any more.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5def1360
  8. 27 5月, 2011 2 次提交
    • C
      fs: pass exact type of data dirties to ->dirty_inode · aa385729
      Christoph Hellwig 提交于
      Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
      anything else, so that the filesystem can track internally if it
      needs to push out a transaction for fdatasync or not.
      
      This is just the prototype change with no user for it yet.  I plan
      to push large XFS changes for the next merge window, and getting
      this trivial infrastructure in this window would help a lot to avoid
      tree interdependencies.
      
      Also remove incorrect comments that ->dirty_inode can't block.  That
      has been changed a long time ago, and many implementations rely on it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aa385729
    • D
      ext4: add cleancache support · 7abc52c2
      Dan Magenheimer 提交于
      This seventh patch of eight in this cleancache series "opts-in"
      cleancache for ext4.  Filesystems must explicitly enable cleancache
      by calling cleancache_init_fs anytime an instance of the filesystem
      is mounted. For ext4, all other cleancache hooks are in
      the VFS layer including the matching cleancache_flush_fs
      hook which must be called on unmount.
      
      Details and a FAQ can be found in Documentation/vm/cleancache.txt
      
      [v6-v8: no changes]
      [v5: jeremy@goop.org: simplify init hook and any future fs init changes]
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Reviewed-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NAndreas Dilger <adilger@sun.com>
      Cc: Ted Ts'o <tytso@mit.edu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik Van Riel <riel@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      7abc52c2
  9. 26 5月, 2011 5 次提交
  10. 25 5月, 2011 3 次提交
    • V
      ext4: do not normalize block requests from fallocate() · 556b27ab
      Vivek Haldar 提交于
      Currently, an fallocate request of size slightly larger than a power of
      2 is turned into two block requests, each a power of 2, with the extra
      blocks pre-allocated for future use. When an application calls
      fallocate, it already has an idea about how large the file may grow so
      there is usually little benefit to reserve extra blocks on the
      preallocation list. This reduces disk fragmentation.
      
      Tested: fsstress. Also verified manually that fallocat'ed files are
      contiguously laid out with this change (whereas without it they begin at
      power-of-2 boundaries, leaving blocks in between). CPU usage of
      fallocate is not appreciably higher.  In a tight fallocate loop, CPU
      usage hovers between 5%-8% with this change, and 5%-7% without it.
      
      Using a simulated file system aging program which the file system to
      70%, the percentage of free extents larger than 8MB (as measured by
      e2freefrag) increased from 38.8% without this change, to 69.4% with
      this change.
      Signed-off-by: NVivek Haldar <haldar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      556b27ab
    • A
      ext4: enable "punch hole" functionality · a4bb6b64
      Allison Henderson 提交于
      This patch adds new routines: "ext4_punch_hole" "ext4_ext_punch_hole"
      and "ext4_ext_check_cache"
      
      fallocate has been modified to call ext4_punch_hole when the punch hole
      flag is passed.  At the moment, we only support punching holes in
      extents, so this routine is pretty much a wrapper for the ext4_ext_punch_hole
      routine.
      
      The ext4_ext_punch_hole routine first completes all outstanding writes
      with the associated pages, and then releases them.  The unblock
      aligned data is zeroed, and all blocks in between are punched out.
      
      The ext4_ext_check_cache routine is very similar to ext4_ext_in_cache
      except it accepts a ext4_ext_cache parameter instead of a ext4_extent
      parameter.  This routine is used by ext4_ext_punch_hole to check and
      see if a block in a hole that has been cached.  The ext4_ext_cache
      parameter is necessary because the members ext4_extent structure are
      not large enough to hold a 32 bit value.  The existing
      ext4_ext_in_cache routine has become a wrapper to this new function.
      
      [ext4 punch hole patch series 5/5 v7] 
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      a4bb6b64
    • A
      ext4: add "punch hole" flag to ext4_map_blocks() · e861304b
      Allison Henderson 提交于
      This patch adds a new flag to ext4_map_blocks() that specifies the
      given range of blocks should be punched out.  Extents are first
      converted to uninitialized extents before they are punched
      out. Because punching a hole may require that the extent be split, it
      is possible that the splitting may need more blocks than are
      available.  To deal with this, use of reserved blocks are enabled to
      allow the split to proceed.
      
      The routine then returns the number of blocks successfully
      punched out.
      
      [ext4 punch hole patch series 4/5 v7]
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      e861304b