1. 24 7月, 2011 1 次提交
  2. 18 7月, 2011 1 次提交
  3. 12 7月, 2011 2 次提交
  4. 11 7月, 2011 6 次提交
    • T
      ext4: Change the wrong param comment for ext4_trim_all_free · 22612283
      Tao Ma 提交于
      at ext4_trim_all_free() comment, there is no longer an @e4b parameter,
      instead it is @group.
      Reported-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      22612283
    • T
      ext4: Speed up FITRIM by recording flags in ext4_group_info · 3d56b8d2
      Tao Ma 提交于
      In ext4, when FITRIM is called every time, we iterate all the
      groups and do trim one by one. It is a bit time wasting if the
      group has been trimmed and there is no change since the last
      trim.
      
      So this patch adds a new flag in ext4_group_info->bb_state to
      indicate that the group has been trimmed, and it will be cleared
      if some blocks is freed(in release_blocks_on_commit). Another
      trim_minlen is added in ext4_sb_info to record the last minlen
      we use to trim the volume, so that if the caller provide a small
      one, we will go on the trim regardless of the bb_state.
      
      A simple test with my intel x25m ssd:
      df -h shows:
      /dev/sdb1              40G   21G   17G  56% /mnt/ext4
      Block size:               4096
      
      run the FITRIM with the following parameter:
      range.start = 0;
      range.len = UINT64_MAX;
      range.minlen = 1048576;
      
      without the patch:
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.505s
      user	0m0.000s
      sys	0m1.224s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.359s
      user	0m0.000s
      sys	0m1.178s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.228s
      user	0m0.000s
      sys	0m1.151s
      
      with the patch:
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.625s
      user	0m0.000s
      sys	0m1.269s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m0.002s
      user	0m0.000s
      sys	0m0.001s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m0.002s
      user	0m0.000s
      sys	0m0.001s
      
      A big improvement for the 2nd and 3rd run.
      
      Even after I delete some big image files, it is still much
      faster than iterating the whole disk.
      
      [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
      real	0m1.217s
      user	0m0.000s
      sys	0m0.196s
      
      Cc: Lukas Czerner <lczerner@redhat.com>
      Reviewed-by: NAndreas Dilger <adilger.kernel@dilger.ca>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      3d56b8d2
    • T
      ext4: Add new ext4 trim tracepoints · b3d4c2b1
      Tao Ma 提交于
      Add ext4_trim_extent and ext4_trim_all_free.
      Reviewed-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b3d4c2b1
    • T
      ext4: speed up group trim with the right free block count · 169ddc3e
      Tao Ma 提交于
      When we trim some free blocks in a group of ext4, we need to 
      calculate the free blocks properly and check whether there are
      enough freed blocks left for us to trim. Current solution will
      only calculate free spaces if they are large for a trim which
      isn't appropriate.
      
      Let us see a small example:
      a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
      And minblocks is 1M.  With current solution, we have to iterate
      the whole group since these 300k will never be subtracted from
      1.5M.  But actually we should exit after we find the first 2
      free spaces since the left 3 chunks only sum up to 900K if we
      subtract the first 600K although they can't be trimed.
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      169ddc3e
    • T
      ext4: fix trim length underflow with small trim length · 22f10457
      Tao Ma 提交于
      In 0f0a25bf, we adjust 'len' with s_first_data_block - start, but
      it could underflow in case blocksize=1K, fstrim_range.len=512 and
      fstrim_range.start = 0. In this case, when we run the code:
      len -= first_data_blk - start; len will be underflow to -1ULL.
      In the end, although we are safe that last_group check later will limit
      the trim to the whole volume, but that isn't what the user really want.
      
      So this patch fix it. It also adds the check for 'start' like ext3 so that
      we can break immediately if the start is invalid.
      
      Cc: Lukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      22f10457
    • M
      ext4: fix i_blocks/quota accounting when extent insertion fails · 7132de74
      Maxim Patlasov 提交于
      The current implementation of ext4_free_blocks() always calls
      dquot_free_block This looks quite sensible in the most cases: blocks
      to be freed are associated with inode and were accounted in quota and
      i_blocks some time ago.
      
      However, there is a case when blocks to free were not accounted by the
      time calling ext4_free_blocks() yet:
      
      1. delalloc is on, write_begin pre-allocated some space in quota
      2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks()
      3. then ext4_ext_map_blocks() gets an error (e.g.  ENOSPC) from
         ext4_ext_insert_extent() and calls ext4_free_blocks().
      
      In this scenario, ext4_free_blocks() calls dquot_free_block() who, in
      turn, decrements i_blocks for blocks which were not accounted yet (due
      to delalloc) After clean umount, e2fsck reports something like:
      
      > Inode 21, i_blocks is 5080, should be 5128.  Fix<y>?
      because i_blocks was erroneously decremented as explained above.
      
      The patch fixes the problem by passing the new flag
      EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request
      that the dquot_free_block() call be skipped.
      Signed-off-by: NMaxim Patlasov <maxim.patlasov@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      7132de74
  5. 28 6月, 2011 1 次提交
  6. 06 6月, 2011 2 次提交
  7. 25 5月, 2011 3 次提交
    • A
      ext4: add flag to ext4_has_free_blocks · 55f020db
      Allison Henderson 提交于
      This patch adds an allocation request flag to the ext4_has_free_blocks
      function which enables the use of reserved blocks.  This will allow a
      punch hole to proceed even if the disk is full.  Punching a hole may
      require additional blocks to first split the extents.
      
      Because ext4_has_free_blocks is a low level function, the flag needs
      to be passed down through several functions listed below:
      
      ext4_ext_insert_extent
      ext4_ext_create_new_leaf
      ext4_ext_grow_indepth
      ext4_ext_split
      ext4_ext_new_meta_block
      ext4_mb_new_blocks
      ext4_claim_free_blocks
      ext4_has_free_blocks
      
      [ext4 punch hole patch series 1/5 v7]
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      55f020db
    • L
      ext4: protect bb_first_free in ext4_trim_all_free() with group lock · 28739eea
      Lukas Czerner 提交于
      We should protect reading bd_info->bb_first_free with the group lock
      because otherwise we might miss some free blocks. This is not a big deal
      at all, but the change to do right thing is really simple, so lets do
      that.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      28739eea
    • L
      ext4: only load buddy bitmap in ext4_trim_fs() when it is needed · 78944086
      Lukas Czerner 提交于
      Currently we are loading buddy ext4_mb_load_buddy() for every block
      group we are going through in ext4_trim_fs() in many cases just to find
      out that there is not enough space to be bothered with. As Amir Goldstein
      suggested we can use bb_free information directly from ext4_group_info.
      
      This commit removes ext4_mb_load_buddy() from ext4_trim_fs() and rather
      get the ext4_group_info via ext4_get_group_info() and use the bb_free
      information directly from that. This avoids unnecessary call to load
      buddy in the case the group does not have enough free space to trim.
      Loading buddy is now moved to ext4_trim_all_free().
      
      Tested by me with xfstests 251.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      78944086
  8. 10 5月, 2011 4 次提交
  9. 09 5月, 2011 2 次提交
    • T
      ext4: remove unneeded ext4_journal_get_undo_access · 2cd05cc3
      Theodore Ts'o 提交于
      The block allocation code used to use jbd2_journal_get_undo_access as
      a way to make changes that wouldn't show up until the commit took
      place.  The new multi-block allocation code has a its own way of
      preventing newly freed blocks from getting reused until the commit
      takes place (it avoids updating the buddy bitmaps until the commit is
      done), so we don't need to use jbd2_journal_get_undo_access(), which
      has extra overhead compared to jbd2_journal_get_write_access().
      
      There was one last vestigal use of ext4_journal_get_undo_access() in
      ext4_add_groupblocks(); change it to use ext4_journal_get_write_access()
      and then remove the ext4_journal_get_undo_access() support.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2cd05cc3
    • A
      ext4: move ext4_add_groupblocks() to mballoc.c · 2846e820
      Amir Goldstein 提交于
      In preparation for the next patch, the function ext4_add_groupblocks()
      is moved to mballoc.c, where it could use some static functions.
      Signed-off-by: NAmir Goldstein <amir73il@users.sf.net>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2846e820
  10. 01 5月, 2011 1 次提交
    • T
      ext4: ignore errors when issuing discards · d9f34504
      Theodore Ts'o 提交于
      This is an effective revert of commit a30eec2a: "ext4: stop issuing
      discards if not supported by device".  The problem is that there are
      some devices that may return errors in response to a discard request
      some times but not others.  (One example would be a hybrid dm device
      which concatenates an SSD and an HDD device).
      
      By this logic, I also removed the error checking from ext4's FITRIM
      code; so that an error from a discard will not stop the FITRIM from
      trying to trim the rest of the file system.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d9f34504
  11. 17 4月, 2011 1 次提交
  12. 31 3月, 2011 1 次提交
  13. 24 3月, 2011 1 次提交
    • T
      ext4: fix a BUG in mb_mark_used during trim. · 0ba08517
      Tao Ma 提交于
      In a bs=4096 volume, if we call FITRIM with the following parameter as
      fstrim_range(start = 102400, len = 134144000, minlen = 10240),
      we will trigger this BUG_ON:
      
      	BUG_ON(start + len > (e4b->bd_sb->s_blocksize << 3));
      
      Mar  4 00:55:52 boyu-tm kernel: ------------[ cut here ]------------
      Mar  4 00:55:52 boyu-tm kernel: kernel BUG at fs/ext4/mballoc.c:1506!
      Mar  4 01:21:09 boyu-tm kernel: Code: d4 00 00 00 00 49 89 fe 8b 56 0c 44 8b 7e 04 89 55 c4 48 8b 4f 28 89 d6 44 01 fe 48 63 d6 48 8b 41 18 48 c1 e0 03 48 39 c2 76 04 <0f> 0b eb fe 48 8b 55 b0 8b 47 34 3b 42 08 74 04 0f 0b eb fe 48
      Mar  4 01:21:09 boyu-tm kernel: RIP  [<ffffffffa053eb42>] mb_mark_used+0x47/0x26c [ext4]
      Mar  4 01:21:09 boyu-tm kernel:  RSP <ffff880121e45c38>
      Mar  4 01:21:09 boyu-tm kernel: ---[ end trace 9f461696f6a9dcf2 ]---
      
      Fix this bug by doing the accounting correctly.
      
      Cc: Lukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0ba08517
  14. 22 3月, 2011 1 次提交
  15. 28 2月, 2011 1 次提交
  16. 25 2月, 2011 4 次提交
    • C
      ext4: mballoc: don't replace the current preallocation group unnecessarily · 5a54b2f1
      Coly Li 提交于
      In ext4_mb_check_group_pa(), the current preallocation space is
      replaced with a new preallocation space when the two have the same
      distance from the goal block.
      
      This doesn't actually gain us anything, so change things so that the
      function only switches to the new preallocation group if its distance
      from the goal block is strictly smaller than the current preallocaiton
      group's distance from the goal block.
      Signed-off-by: NColy Li <bosong.ly@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5a54b2f1
    • C
      mballoc: add comments to ext4_mb_mark_free_simple() · 7c786059
      Coly Li 提交于
      This patch adds comments to ext4_mb_mark_free_simple to make it more
      understandable.
      Signed-off-by: NColy Li <bosong.ly@taobao.com>
      Cc: Alex Tomas <alex@clusterfs.com>
      Cc: Theodore Tso <tytso@google.com>
      7c786059
    • C
      ext4: remove unncessary call mb_find_buddy() in debugging code · 235772da
      Coly Li 提交于
      In __mb_check_buddy(), look at the code below:
        591         fstart = -1;
        592         buddy = mb_find_buddy(e4b, 0, &max);
        593         for (i = 0; i < max; i++) {
        594                 if (!mb_test_bit(i, buddy)) {
        595                         MB_CHECK_ASSERT(i >= e4b->bd_info->bb_first_free);
        596                         if (fstart == -1) {
        597                                 fragments++;
        598                                 fstart = i;
        599                         }
        600                         continue;
        601                 }
        602                 fstart = -1;
        603                 /* check used bits only */
        604                 for (j = 0; j < e4b->bd_blkbits + 1; j++) {
        605                         buddy2 = mb_find_buddy(e4b, j, &max2);
        606                         k = i >> j;
        607                         MB_CHECK_ASSERT(k < max2);
        608                         MB_CHECK_ASSERT(mb_test_bit(k, buddy2));
        609                 }
        610         }
        611         MB_CHECK_ASSERT(!EXT4_MB_GRP_NEED_INIT(e4b->bd_info));
        612         MB_CHECK_ASSERT(e4b->bd_info->bb_fragments == fragments);
        613
        614         grp = ext4_get_group_info(sb, e4b->bd_group);
        615         buddy = mb_find_buddy(e4b, 0, &max);
      
      On line 592, buddy is fetched by mb_find_buddy() with order 0, between
      line 593 to line 615, buddy is not changed, therefore there is
      no need to fetch buddy again from mb_find_buddy() with order 0 again.
      
      We can safely remove the second mb_find_buddy() on line 615.
      Signed-off-by: NColy Li <bosong.ly@taobao.com>
      Cc: Alex Tomas <alex@clusterfs.com>
      Cc: Theodore Tso <tytso@google.com>
      235772da
    • C
      ext4: code cleanup in mb_find_buddy() · 84b775a3
      Coly Li 提交于
      Current code calculate max no matter whether order is zero, it's
      unnecessary. This cleanup patch sets max to "1 << (e4b->bd_blkbits
      + 3)" only when order == 0.
      Signed-off-by: NColy Li <bosong.ly@taobao.com>
      Cc: Alex Tomas <alex@clusterfs.com>
      Cc: Theodore Tso <tytso@google.com>
      84b775a3
  17. 24 2月, 2011 1 次提交
  18. 12 2月, 2011 1 次提交
    • E
      ext4: make grpinfo slab cache names static · 2892c15d
      Eric Sandeen 提交于
      In 2.6.37 I was running into oopses with repeated module
      loads & unloads.  I tracked this down to:
      
      fb1813f4 ext4: use dedicated slab caches for group_info structures
      
      (this was in addition to the features advert unload problem)
      
      The kstrdup & subsequent kfree of the cache name was causing
      a double free.  In slub, at least, if I read it right it allocates
      & frees the name itself, slab seems to do something different...
      so in slub I think we were leaking -our- cachep->name, and double
      freeing the one allocated by slub.
      
      After getting lost in slab/slub/slob a bit, I just looked at other
      sized-caches that get allocated.  jbd2, biovec, sgpool all do it
      more or less the way jbd2 does.  Below patch follows the jbd2
      method of dynamically allocating a cache at mount time from
      a list of static names.
      
      (This might also possibly fix a race creating the caches with
      parallel mounts running).
      
      [Folded in a fix from Dan Carpenter which fixed an off-by-one error in
      the original patch]
      
      Cc: stable@kernel.org
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2892c15d
  19. 12 1月, 2011 2 次提交
  20. 11 1月, 2011 4 次提交
    • T
      ext4: remove ext4_mb_return_to_preallocation() · a5196f8c
      Theodore Ts'o 提交于
      This function was never implemented, except for a BUG_ON which was
      tripping when ext4 is run without a journal.  The problem is that
      although the comment asserts that "truncate (which is the only way to
      free block) discards all preallocations", ext4_free_blocks() is also
      called in various error recovery paths when blocks have been
      allocated, but for various reasons, we were not able to use those data
      blocks (for example, because we ran out of memory while trying to
      manipulate the extent tree, or some other similar situation).
      
      In addition to the fact that this function isn't implemented except
      for the incorrect BUG_ON, the single caller of this function,
      ext4_free_blocks(), doesn't use it all if the journal is enabled.
      
      So remove the (stub) function entirely for now.  If we decide it's
      better to add it back, it's only going to be useful with a relatively
      large number of code changes anyway.
      
      Google-Bug-Id: 3236408
      
      Cc: Jiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a5196f8c
    • J
      ext4: fix trimming of a single group · ca6e909f
      Jan Kara 提交于
      When ext4_trim_fs() is called to trim a part of a single group, the
      logic will wrongly set last block of the interval to 'len' instead
      of 'first_block + len'. Thus a shorter interval is possibly trimmed.
      Fix it.
      
      CC: Lukas Czerner <lczerner@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ca6e909f
    • T
      ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED · f2321097
      Theodore Ts'o 提交于
      Remove the short element i_delalloc_reserved_flag from the
      ext4_inode_info structure and replace it a new bit in i_state_flags.
      Since we have an ext4_inode_info for every ext4 inode cached in the
      inode cache, any savings we can produce here is a very good thing from
      a memory utilization perspective.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f2321097
    • L
      ext4: remove warning message from ext4_issue_discard helper · 93259636
      Lukas Czerner 提交于
      ext4_issue_discard is supposed to be helper for calling discard, however
      in case that underlying device does not support discard it prints out
      the warning message and clears the DISCARD t_mount_opt flag. Since it
      can be (and is) used by others, it should not do anything and let the
      caller to handle the error case.
      
      This commit removes warning message and flag setting from
      ext4_issue_discard and use it just in place where it is really needed
      (release_blocks_on_commit). FITRIM ioctl should not set any flags nor it
      should print out warning messages, so get rid of the warning as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      93259636