1. 25 5月, 2011 1 次提交
    • L
      ext4: only load buddy bitmap in ext4_trim_fs() when it is needed · 78944086
      Lukas Czerner 提交于
      Currently we are loading buddy ext4_mb_load_buddy() for every block
      group we are going through in ext4_trim_fs() in many cases just to find
      out that there is not enough space to be bothered with. As Amir Goldstein
      suggested we can use bb_free information directly from ext4_group_info.
      
      This commit removes ext4_mb_load_buddy() from ext4_trim_fs() and rather
      get the ext4_group_info via ext4_get_group_info() and use the bb_free
      information directly from that. This avoids unnecessary call to load
      buddy in the case the group does not have enough free space to trim.
      Loading buddy is now moved to ext4_trim_all_free().
      
      Tested by me with xfstests 251.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      78944086
  2. 10 5月, 2011 4 次提交
  3. 09 5月, 2011 2 次提交
    • T
      ext4: remove unneeded ext4_journal_get_undo_access · 2cd05cc3
      Theodore Ts'o 提交于
      The block allocation code used to use jbd2_journal_get_undo_access as
      a way to make changes that wouldn't show up until the commit took
      place.  The new multi-block allocation code has a its own way of
      preventing newly freed blocks from getting reused until the commit
      takes place (it avoids updating the buddy bitmaps until the commit is
      done), so we don't need to use jbd2_journal_get_undo_access(), which
      has extra overhead compared to jbd2_journal_get_write_access().
      
      There was one last vestigal use of ext4_journal_get_undo_access() in
      ext4_add_groupblocks(); change it to use ext4_journal_get_write_access()
      and then remove the ext4_journal_get_undo_access() support.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2cd05cc3
    • A
      ext4: move ext4_add_groupblocks() to mballoc.c · 2846e820
      Amir Goldstein 提交于
      In preparation for the next patch, the function ext4_add_groupblocks()
      is moved to mballoc.c, where it could use some static functions.
      Signed-off-by: NAmir Goldstein <amir73il@users.sf.net>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2846e820
  4. 01 5月, 2011 1 次提交
    • T
      ext4: ignore errors when issuing discards · d9f34504
      Theodore Ts'o 提交于
      This is an effective revert of commit a30eec2a: "ext4: stop issuing
      discards if not supported by device".  The problem is that there are
      some devices that may return errors in response to a discard request
      some times but not others.  (One example would be a hybrid dm device
      which concatenates an SSD and an HDD device).
      
      By this logic, I also removed the error checking from ext4's FITRIM
      code; so that an error from a discard will not stop the FITRIM from
      trying to trim the rest of the file system.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d9f34504
  5. 17 4月, 2011 1 次提交
  6. 31 3月, 2011 1 次提交
  7. 24 3月, 2011 1 次提交
    • T
      ext4: fix a BUG in mb_mark_used during trim. · 0ba08517
      Tao Ma 提交于
      In a bs=4096 volume, if we call FITRIM with the following parameter as
      fstrim_range(start = 102400, len = 134144000, minlen = 10240),
      we will trigger this BUG_ON:
      
      	BUG_ON(start + len > (e4b->bd_sb->s_blocksize << 3));
      
      Mar  4 00:55:52 boyu-tm kernel: ------------[ cut here ]------------
      Mar  4 00:55:52 boyu-tm kernel: kernel BUG at fs/ext4/mballoc.c:1506!
      Mar  4 01:21:09 boyu-tm kernel: Code: d4 00 00 00 00 49 89 fe 8b 56 0c 44 8b 7e 04 89 55 c4 48 8b 4f 28 89 d6 44 01 fe 48 63 d6 48 8b 41 18 48 c1 e0 03 48 39 c2 76 04 <0f> 0b eb fe 48 8b 55 b0 8b 47 34 3b 42 08 74 04 0f 0b eb fe 48
      Mar  4 01:21:09 boyu-tm kernel: RIP  [<ffffffffa053eb42>] mb_mark_used+0x47/0x26c [ext4]
      Mar  4 01:21:09 boyu-tm kernel:  RSP <ffff880121e45c38>
      Mar  4 01:21:09 boyu-tm kernel: ---[ end trace 9f461696f6a9dcf2 ]---
      
      Fix this bug by doing the accounting correctly.
      
      Cc: Lukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0ba08517
  8. 22 3月, 2011 1 次提交
  9. 28 2月, 2011 1 次提交
  10. 25 2月, 2011 4 次提交
    • C
      ext4: mballoc: don't replace the current preallocation group unnecessarily · 5a54b2f1
      Coly Li 提交于
      In ext4_mb_check_group_pa(), the current preallocation space is
      replaced with a new preallocation space when the two have the same
      distance from the goal block.
      
      This doesn't actually gain us anything, so change things so that the
      function only switches to the new preallocation group if its distance
      from the goal block is strictly smaller than the current preallocaiton
      group's distance from the goal block.
      Signed-off-by: NColy Li <bosong.ly@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5a54b2f1
    • C
      mballoc: add comments to ext4_mb_mark_free_simple() · 7c786059
      Coly Li 提交于
      This patch adds comments to ext4_mb_mark_free_simple to make it more
      understandable.
      Signed-off-by: NColy Li <bosong.ly@taobao.com>
      Cc: Alex Tomas <alex@clusterfs.com>
      Cc: Theodore Tso <tytso@google.com>
      7c786059
    • C
      ext4: remove unncessary call mb_find_buddy() in debugging code · 235772da
      Coly Li 提交于
      In __mb_check_buddy(), look at the code below:
        591         fstart = -1;
        592         buddy = mb_find_buddy(e4b, 0, &max);
        593         for (i = 0; i < max; i++) {
        594                 if (!mb_test_bit(i, buddy)) {
        595                         MB_CHECK_ASSERT(i >= e4b->bd_info->bb_first_free);
        596                         if (fstart == -1) {
        597                                 fragments++;
        598                                 fstart = i;
        599                         }
        600                         continue;
        601                 }
        602                 fstart = -1;
        603                 /* check used bits only */
        604                 for (j = 0; j < e4b->bd_blkbits + 1; j++) {
        605                         buddy2 = mb_find_buddy(e4b, j, &max2);
        606                         k = i >> j;
        607                         MB_CHECK_ASSERT(k < max2);
        608                         MB_CHECK_ASSERT(mb_test_bit(k, buddy2));
        609                 }
        610         }
        611         MB_CHECK_ASSERT(!EXT4_MB_GRP_NEED_INIT(e4b->bd_info));
        612         MB_CHECK_ASSERT(e4b->bd_info->bb_fragments == fragments);
        613
        614         grp = ext4_get_group_info(sb, e4b->bd_group);
        615         buddy = mb_find_buddy(e4b, 0, &max);
      
      On line 592, buddy is fetched by mb_find_buddy() with order 0, between
      line 593 to line 615, buddy is not changed, therefore there is
      no need to fetch buddy again from mb_find_buddy() with order 0 again.
      
      We can safely remove the second mb_find_buddy() on line 615.
      Signed-off-by: NColy Li <bosong.ly@taobao.com>
      Cc: Alex Tomas <alex@clusterfs.com>
      Cc: Theodore Tso <tytso@google.com>
      235772da
    • C
      ext4: code cleanup in mb_find_buddy() · 84b775a3
      Coly Li 提交于
      Current code calculate max no matter whether order is zero, it's
      unnecessary. This cleanup patch sets max to "1 << (e4b->bd_blkbits
      + 3)" only when order == 0.
      Signed-off-by: NColy Li <bosong.ly@taobao.com>
      Cc: Alex Tomas <alex@clusterfs.com>
      Cc: Theodore Tso <tytso@google.com>
      84b775a3
  11. 24 2月, 2011 1 次提交
  12. 12 2月, 2011 1 次提交
    • E
      ext4: make grpinfo slab cache names static · 2892c15d
      Eric Sandeen 提交于
      In 2.6.37 I was running into oopses with repeated module
      loads & unloads.  I tracked this down to:
      
      fb1813f4 ext4: use dedicated slab caches for group_info structures
      
      (this was in addition to the features advert unload problem)
      
      The kstrdup & subsequent kfree of the cache name was causing
      a double free.  In slub, at least, if I read it right it allocates
      & frees the name itself, slab seems to do something different...
      so in slub I think we were leaking -our- cachep->name, and double
      freeing the one allocated by slub.
      
      After getting lost in slab/slub/slob a bit, I just looked at other
      sized-caches that get allocated.  jbd2, biovec, sgpool all do it
      more or less the way jbd2 does.  Below patch follows the jbd2
      method of dynamically allocating a cache at mount time from
      a list of static names.
      
      (This might also possibly fix a race creating the caches with
      parallel mounts running).
      
      [Folded in a fix from Dan Carpenter which fixed an off-by-one error in
      the original patch]
      
      Cc: stable@kernel.org
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2892c15d
  13. 12 1月, 2011 2 次提交
  14. 11 1月, 2011 5 次提交
    • T
      ext4: remove ext4_mb_return_to_preallocation() · a5196f8c
      Theodore Ts'o 提交于
      This function was never implemented, except for a BUG_ON which was
      tripping when ext4 is run without a journal.  The problem is that
      although the comment asserts that "truncate (which is the only way to
      free block) discards all preallocations", ext4_free_blocks() is also
      called in various error recovery paths when blocks have been
      allocated, but for various reasons, we were not able to use those data
      blocks (for example, because we ran out of memory while trying to
      manipulate the extent tree, or some other similar situation).
      
      In addition to the fact that this function isn't implemented except
      for the incorrect BUG_ON, the single caller of this function,
      ext4_free_blocks(), doesn't use it all if the journal is enabled.
      
      So remove the (stub) function entirely for now.  If we decide it's
      better to add it back, it's only going to be useful with a relatively
      large number of code changes anyway.
      
      Google-Bug-Id: 3236408
      
      Cc: Jiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a5196f8c
    • J
      ext4: fix trimming of a single group · ca6e909f
      Jan Kara 提交于
      When ext4_trim_fs() is called to trim a part of a single group, the
      logic will wrongly set last block of the interval to 'len' instead
      of 'first_block + len'. Thus a shorter interval is possibly trimmed.
      Fix it.
      
      CC: Lukas Czerner <lczerner@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ca6e909f
    • T
      ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED · f2321097
      Theodore Ts'o 提交于
      Remove the short element i_delalloc_reserved_flag from the
      ext4_inode_info structure and replace it a new bit in i_state_flags.
      Since we have an ext4_inode_info for every ext4 inode cached in the
      inode cache, any savings we can produce here is a very good thing from
      a memory utilization perspective.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f2321097
    • L
      ext4: remove warning message from ext4_issue_discard helper · 93259636
      Lukas Czerner 提交于
      ext4_issue_discard is supposed to be helper for calling discard, however
      in case that underlying device does not support discard it prints out
      the warning message and clears the DISCARD t_mount_opt flag. Since it
      can be (and is) used by others, it should not do anything and let the
      caller to handle the error case.
      
      This commit removes warning message and flag setting from
      ext4_issue_discard and use it just in place where it is really needed
      (release_blocks_on_commit). FITRIM ioctl should not set any flags nor it
      should print out warning messages, so get rid of the warning as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      93259636
    • L
      ext4: fix possible overflow in ext4_trim_fs() · 4f531501
      Lukas Czerner 提交于
      When determining last group through ext4_get_group_no_and_offset() the
      result may be wrong in cases when range->start and range-len are too
      big, because it may overflow when summing up those two numbers.
      
      Fix that by checking range->len and limit its value to
      ext4_blocks_count(). This commit was tested by myself with expected
      result.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      4f531501
  15. 20 12月, 2010 1 次提交
  16. 16 12月, 2010 1 次提交
  17. 09 11月, 2010 1 次提交
    • T
      ext4: Don't call sb_issue_discard() in ext4_free_blocks() · b56ff9d3
      Theodore Ts'o 提交于
      Commit 5c521830 (ext4: Support discard requests when running in
      no-journal mode) attempts to add sb_issue_discard() for data blocks
      (in data=writeback mode) and in no-journal mode.  Unfortunately, this
      no longer works, because in commit dd3932ed (block: remove
      BLKDEV_IFL_WAIT), sb_issue_discard() only presents a synchronous
      interface, and there are times when we call ext4_free_blocks() when we
      are are holding a spinlock, or are otherwise in an atomic context.
      
      For now, I've removed the call to sb_issue_discard() to prevent a
      deadlock or (if spinlock debugging is enabled) failures like this:
      
      BUG: scheduling while atomic: rc.sysinit/1376/0x00000002
      Pid: 1376, comm: rc.sysinit Not tainted 2.6.36-ARCH #1
      Call Trace:
      [<ffffffff810397ce>] __schedule_bug+0x5e/0x70
      [<ffffffff81403110>] schedule+0x950/0xa70
      [<ffffffff81060bad>] ? insert_work+0x7d/0x90
      [<ffffffff81060fbd>] ? queue_work_on+0x1d/0x30
      [<ffffffff81061127>] ? queue_work+0x37/0x60
      [<ffffffff8140377d>] schedule_timeout+0x21d/0x360
      [<ffffffff812031c3>] ? generic_make_request+0x2c3/0x540
      [<ffffffff81402680>] wait_for_common+0xc0/0x150
      [<ffffffff81041490>] ? default_wake_function+0x0/0x10
      [<ffffffff812034bc>] ? submit_bio+0x7c/0x100
      [<ffffffff810680a0>] ? wake_bit_function+0x0/0x40
      [<ffffffff814027b8>] wait_for_completion+0x18/0x20
      [<ffffffff8120a969>] blkdev_issue_discard+0x1b9/0x210
      [<ffffffff811ba03e>] ext4_free_blocks+0x68e/0xb60
      [<ffffffff811b1650>] ? __ext4_handle_dirty_metadata+0x110/0x120
      [<ffffffff811b098c>] ext4_ext_truncate+0x8cc/0xa70
      [<ffffffff810d713e>] ? pagevec_lookup+0x1e/0x30
      [<ffffffff81191618>] ext4_truncate+0x178/0x5d0
      [<ffffffff810eacbb>] ? unmap_mapping_range+0xab/0x280
      [<ffffffff810d8976>] vmtruncate+0x56/0x70
      [<ffffffff811925cb>] ext4_setattr+0x14b/0x460
      [<ffffffff811319e4>] notify_change+0x194/0x380
      [<ffffffff81117f80>] do_truncate+0x60/0x90
      [<ffffffff811e08fa>] ? security_inode_permission+0x1a/0x20
      [<ffffffff811eaec1>] ? tomoyo_path_truncate+0x11/0x20
      [<ffffffff81127539>] do_last+0x5d9/0x770
      [<ffffffff811278bd>] do_filp_open+0x1ed/0x680
      [<ffffffff8140644f>] ? page_fault+0x1f/0x30
      [<ffffffff81132bfc>] ? alloc_fd+0xec/0x140
      [<ffffffff81118db1>] do_sys_open+0x61/0x120
      [<ffffffff81118e8b>] sys_open+0x1b/0x20
      [<ffffffff81002e6b>] system_call_fastpath+0x16/0x1b
      
      https://bugzilla.kernel.org/show_bug.cgi?id=22302Reported-by: NMathias Burén <mathias.buren@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: jiayingz@google.com
      b56ff9d3
  18. 28 10月, 2010 10 次提交
  19. 26 10月, 2010 1 次提交
    • C
      fs: do not assign default i_ino in new_inode · 85fe4025
      Christoph Hellwig 提交于
      Instead of always assigning an increasing inode number in new_inode
      move the call to assign it into those callers that actually need it.
      For now callers that need it is estimated conservatively, that is
      the call is added to all filesystems that do not assign an i_ino
      by themselves.  For a few more filesystems we can avoid assigning
      any inode number given that they aren't user visible, and for others
      it could be done lazily when an inode number is actually needed,
      but that's left for later patches.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      85fe4025