1. 09 5月, 2011 1 次提交
  2. 01 5月, 2011 1 次提交
    • T
      ext4: ignore errors when issuing discards · d9f34504
      Theodore Ts'o 提交于
      This is an effective revert of commit a30eec2a: "ext4: stop issuing
      discards if not supported by device".  The problem is that there are
      some devices that may return errors in response to a discard request
      some times but not others.  (One example would be a hybrid dm device
      which concatenates an SSD and an HDD device).
      
      By this logic, I also removed the error checking from ext4's FITRIM
      code; so that an error from a discard will not stop the FITRIM from
      trying to trim the rest of the file system.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d9f34504
  3. 17 4月, 2011 1 次提交
  4. 31 3月, 2011 1 次提交
  5. 24 3月, 2011 1 次提交
    • T
      ext4: fix a BUG in mb_mark_used during trim. · 0ba08517
      Tao Ma 提交于
      In a bs=4096 volume, if we call FITRIM with the following parameter as
      fstrim_range(start = 102400, len = 134144000, minlen = 10240),
      we will trigger this BUG_ON:
      
      	BUG_ON(start + len > (e4b->bd_sb->s_blocksize << 3));
      
      Mar  4 00:55:52 boyu-tm kernel: ------------[ cut here ]------------
      Mar  4 00:55:52 boyu-tm kernel: kernel BUG at fs/ext4/mballoc.c:1506!
      Mar  4 01:21:09 boyu-tm kernel: Code: d4 00 00 00 00 49 89 fe 8b 56 0c 44 8b 7e 04 89 55 c4 48 8b 4f 28 89 d6 44 01 fe 48 63 d6 48 8b 41 18 48 c1 e0 03 48 39 c2 76 04 <0f> 0b eb fe 48 8b 55 b0 8b 47 34 3b 42 08 74 04 0f 0b eb fe 48
      Mar  4 01:21:09 boyu-tm kernel: RIP  [<ffffffffa053eb42>] mb_mark_used+0x47/0x26c [ext4]
      Mar  4 01:21:09 boyu-tm kernel:  RSP <ffff880121e45c38>
      Mar  4 01:21:09 boyu-tm kernel: ---[ end trace 9f461696f6a9dcf2 ]---
      
      Fix this bug by doing the accounting correctly.
      
      Cc: Lukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0ba08517
  6. 22 3月, 2011 1 次提交
  7. 28 2月, 2011 1 次提交
  8. 25 2月, 2011 4 次提交
    • C
      ext4: mballoc: don't replace the current preallocation group unnecessarily · 5a54b2f1
      Coly Li 提交于
      In ext4_mb_check_group_pa(), the current preallocation space is
      replaced with a new preallocation space when the two have the same
      distance from the goal block.
      
      This doesn't actually gain us anything, so change things so that the
      function only switches to the new preallocation group if its distance
      from the goal block is strictly smaller than the current preallocaiton
      group's distance from the goal block.
      Signed-off-by: NColy Li <bosong.ly@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5a54b2f1
    • C
      mballoc: add comments to ext4_mb_mark_free_simple() · 7c786059
      Coly Li 提交于
      This patch adds comments to ext4_mb_mark_free_simple to make it more
      understandable.
      Signed-off-by: NColy Li <bosong.ly@taobao.com>
      Cc: Alex Tomas <alex@clusterfs.com>
      Cc: Theodore Tso <tytso@google.com>
      7c786059
    • C
      ext4: remove unncessary call mb_find_buddy() in debugging code · 235772da
      Coly Li 提交于
      In __mb_check_buddy(), look at the code below:
        591         fstart = -1;
        592         buddy = mb_find_buddy(e4b, 0, &max);
        593         for (i = 0; i < max; i++) {
        594                 if (!mb_test_bit(i, buddy)) {
        595                         MB_CHECK_ASSERT(i >= e4b->bd_info->bb_first_free);
        596                         if (fstart == -1) {
        597                                 fragments++;
        598                                 fstart = i;
        599                         }
        600                         continue;
        601                 }
        602                 fstart = -1;
        603                 /* check used bits only */
        604                 for (j = 0; j < e4b->bd_blkbits + 1; j++) {
        605                         buddy2 = mb_find_buddy(e4b, j, &max2);
        606                         k = i >> j;
        607                         MB_CHECK_ASSERT(k < max2);
        608                         MB_CHECK_ASSERT(mb_test_bit(k, buddy2));
        609                 }
        610         }
        611         MB_CHECK_ASSERT(!EXT4_MB_GRP_NEED_INIT(e4b->bd_info));
        612         MB_CHECK_ASSERT(e4b->bd_info->bb_fragments == fragments);
        613
        614         grp = ext4_get_group_info(sb, e4b->bd_group);
        615         buddy = mb_find_buddy(e4b, 0, &max);
      
      On line 592, buddy is fetched by mb_find_buddy() with order 0, between
      line 593 to line 615, buddy is not changed, therefore there is
      no need to fetch buddy again from mb_find_buddy() with order 0 again.
      
      We can safely remove the second mb_find_buddy() on line 615.
      Signed-off-by: NColy Li <bosong.ly@taobao.com>
      Cc: Alex Tomas <alex@clusterfs.com>
      Cc: Theodore Tso <tytso@google.com>
      235772da
    • C
      ext4: code cleanup in mb_find_buddy() · 84b775a3
      Coly Li 提交于
      Current code calculate max no matter whether order is zero, it's
      unnecessary. This cleanup patch sets max to "1 << (e4b->bd_blkbits
      + 3)" only when order == 0.
      Signed-off-by: NColy Li <bosong.ly@taobao.com>
      Cc: Alex Tomas <alex@clusterfs.com>
      Cc: Theodore Tso <tytso@google.com>
      84b775a3
  9. 24 2月, 2011 1 次提交
  10. 12 2月, 2011 1 次提交
    • E
      ext4: make grpinfo slab cache names static · 2892c15d
      Eric Sandeen 提交于
      In 2.6.37 I was running into oopses with repeated module
      loads & unloads.  I tracked this down to:
      
      fb1813f4 ext4: use dedicated slab caches for group_info structures
      
      (this was in addition to the features advert unload problem)
      
      The kstrdup & subsequent kfree of the cache name was causing
      a double free.  In slub, at least, if I read it right it allocates
      & frees the name itself, slab seems to do something different...
      so in slub I think we were leaking -our- cachep->name, and double
      freeing the one allocated by slub.
      
      After getting lost in slab/slub/slob a bit, I just looked at other
      sized-caches that get allocated.  jbd2, biovec, sgpool all do it
      more or less the way jbd2 does.  Below patch follows the jbd2
      method of dynamically allocating a cache at mount time from
      a list of static names.
      
      (This might also possibly fix a race creating the caches with
      parallel mounts running).
      
      [Folded in a fix from Dan Carpenter which fixed an off-by-one error in
      the original patch]
      
      Cc: stable@kernel.org
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2892c15d
  11. 12 1月, 2011 2 次提交
  12. 11 1月, 2011 5 次提交
    • T
      ext4: remove ext4_mb_return_to_preallocation() · a5196f8c
      Theodore Ts'o 提交于
      This function was never implemented, except for a BUG_ON which was
      tripping when ext4 is run without a journal.  The problem is that
      although the comment asserts that "truncate (which is the only way to
      free block) discards all preallocations", ext4_free_blocks() is also
      called in various error recovery paths when blocks have been
      allocated, but for various reasons, we were not able to use those data
      blocks (for example, because we ran out of memory while trying to
      manipulate the extent tree, or some other similar situation).
      
      In addition to the fact that this function isn't implemented except
      for the incorrect BUG_ON, the single caller of this function,
      ext4_free_blocks(), doesn't use it all if the journal is enabled.
      
      So remove the (stub) function entirely for now.  If we decide it's
      better to add it back, it's only going to be useful with a relatively
      large number of code changes anyway.
      
      Google-Bug-Id: 3236408
      
      Cc: Jiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a5196f8c
    • J
      ext4: fix trimming of a single group · ca6e909f
      Jan Kara 提交于
      When ext4_trim_fs() is called to trim a part of a single group, the
      logic will wrongly set last block of the interval to 'len' instead
      of 'first_block + len'. Thus a shorter interval is possibly trimmed.
      Fix it.
      
      CC: Lukas Czerner <lczerner@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ca6e909f
    • T
      ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED · f2321097
      Theodore Ts'o 提交于
      Remove the short element i_delalloc_reserved_flag from the
      ext4_inode_info structure and replace it a new bit in i_state_flags.
      Since we have an ext4_inode_info for every ext4 inode cached in the
      inode cache, any savings we can produce here is a very good thing from
      a memory utilization perspective.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f2321097
    • L
      ext4: remove warning message from ext4_issue_discard helper · 93259636
      Lukas Czerner 提交于
      ext4_issue_discard is supposed to be helper for calling discard, however
      in case that underlying device does not support discard it prints out
      the warning message and clears the DISCARD t_mount_opt flag. Since it
      can be (and is) used by others, it should not do anything and let the
      caller to handle the error case.
      
      This commit removes warning message and flag setting from
      ext4_issue_discard and use it just in place where it is really needed
      (release_blocks_on_commit). FITRIM ioctl should not set any flags nor it
      should print out warning messages, so get rid of the warning as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      93259636
    • L
      ext4: fix possible overflow in ext4_trim_fs() · 4f531501
      Lukas Czerner 提交于
      When determining last group through ext4_get_group_no_and_offset() the
      result may be wrong in cases when range->start and range-len are too
      big, because it may overflow when summing up those two numbers.
      
      Fix that by checking range->len and limit its value to
      ext4_blocks_count(). This commit was tested by myself with expected
      result.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      4f531501
  13. 20 12月, 2010 1 次提交
  14. 16 12月, 2010 1 次提交
  15. 09 11月, 2010 1 次提交
    • T
      ext4: Don't call sb_issue_discard() in ext4_free_blocks() · b56ff9d3
      Theodore Ts'o 提交于
      Commit 5c521830 (ext4: Support discard requests when running in
      no-journal mode) attempts to add sb_issue_discard() for data blocks
      (in data=writeback mode) and in no-journal mode.  Unfortunately, this
      no longer works, because in commit dd3932ed (block: remove
      BLKDEV_IFL_WAIT), sb_issue_discard() only presents a synchronous
      interface, and there are times when we call ext4_free_blocks() when we
      are are holding a spinlock, or are otherwise in an atomic context.
      
      For now, I've removed the call to sb_issue_discard() to prevent a
      deadlock or (if spinlock debugging is enabled) failures like this:
      
      BUG: scheduling while atomic: rc.sysinit/1376/0x00000002
      Pid: 1376, comm: rc.sysinit Not tainted 2.6.36-ARCH #1
      Call Trace:
      [<ffffffff810397ce>] __schedule_bug+0x5e/0x70
      [<ffffffff81403110>] schedule+0x950/0xa70
      [<ffffffff81060bad>] ? insert_work+0x7d/0x90
      [<ffffffff81060fbd>] ? queue_work_on+0x1d/0x30
      [<ffffffff81061127>] ? queue_work+0x37/0x60
      [<ffffffff8140377d>] schedule_timeout+0x21d/0x360
      [<ffffffff812031c3>] ? generic_make_request+0x2c3/0x540
      [<ffffffff81402680>] wait_for_common+0xc0/0x150
      [<ffffffff81041490>] ? default_wake_function+0x0/0x10
      [<ffffffff812034bc>] ? submit_bio+0x7c/0x100
      [<ffffffff810680a0>] ? wake_bit_function+0x0/0x40
      [<ffffffff814027b8>] wait_for_completion+0x18/0x20
      [<ffffffff8120a969>] blkdev_issue_discard+0x1b9/0x210
      [<ffffffff811ba03e>] ext4_free_blocks+0x68e/0xb60
      [<ffffffff811b1650>] ? __ext4_handle_dirty_metadata+0x110/0x120
      [<ffffffff811b098c>] ext4_ext_truncate+0x8cc/0xa70
      [<ffffffff810d713e>] ? pagevec_lookup+0x1e/0x30
      [<ffffffff81191618>] ext4_truncate+0x178/0x5d0
      [<ffffffff810eacbb>] ? unmap_mapping_range+0xab/0x280
      [<ffffffff810d8976>] vmtruncate+0x56/0x70
      [<ffffffff811925cb>] ext4_setattr+0x14b/0x460
      [<ffffffff811319e4>] notify_change+0x194/0x380
      [<ffffffff81117f80>] do_truncate+0x60/0x90
      [<ffffffff811e08fa>] ? security_inode_permission+0x1a/0x20
      [<ffffffff811eaec1>] ? tomoyo_path_truncate+0x11/0x20
      [<ffffffff81127539>] do_last+0x5d9/0x770
      [<ffffffff811278bd>] do_filp_open+0x1ed/0x680
      [<ffffffff8140644f>] ? page_fault+0x1f/0x30
      [<ffffffff81132bfc>] ? alloc_fd+0xec/0x140
      [<ffffffff81118db1>] do_sys_open+0x61/0x120
      [<ffffffff81118e8b>] sys_open+0x1b/0x20
      [<ffffffff81002e6b>] system_call_fastpath+0x16/0x1b
      
      https://bugzilla.kernel.org/show_bug.cgi?id=22302Reported-by: NMathias Burén <mathias.buren@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: jiayingz@google.com
      b56ff9d3
  16. 28 10月, 2010 10 次提交
  17. 26 10月, 2010 1 次提交
    • C
      fs: do not assign default i_ino in new_inode · 85fe4025
      Christoph Hellwig 提交于
      Instead of always assigning an increasing inode number in new_inode
      move the call to assign it into those callers that actually need it.
      For now callers that need it is estimated conservatively, that is
      the call is added to all filesystems that do not assign an i_ino
      by themselves.  For a few more filesystems we can avoid assigning
      any inode number given that they aren't user visible, and for others
      it could be done lazily when an inode number is actually needed,
      but that's left for later patches.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      85fe4025
  18. 17 9月, 2010 1 次提交
    • C
      block: remove BLKDEV_IFL_WAIT · dd3932ed
      Christoph Hellwig 提交于
      All the blkdev_issue_* helpers can only sanely be used for synchronous
      caller.  To issue cache flushes or barriers asynchronously the caller needs
      to set up a bio by itself with a completion callback to move the asynchronous
      state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
      specified when calling blkdev_issue_* and also remove the now unused flags
      argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
      blkdev_issue_discard we need to keep it for the secure discard flag, which
      gains a more descriptive name and loses the bitops vs flag confusion.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      dd3932ed
  19. 10 9月, 2010 2 次提交
  20. 06 8月, 2010 1 次提交
    • A
      ext4: Adding error check after calling ext4_mb_regular_allocator() · 6c7a120a
      Aditya Kali 提交于
      If the bitmap block on disk is bad, ext4_mb_load_buddy() returns an
      error. This error is returned to the caller,
      ext4_mb_regular_allocator() and then to ext4_mb_new_blocks().  But
      ext4_mb_new_blocks() did not check for the return value of
      ext4_mb_regular_allocator() and would repeatedly try to load the
      bitmap block. The fix simply catches the return value and exits out of
      the 'repeat' loop after cleanup.
      
      We also take the opportunity to clean up the error handling in
      ext4_mb_new_blocks().
      
      Google-Bug-Id: 2853530
      Signed-off-by: NAditya Kali <adityakali@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      6c7a120a
  21. 04 8月, 2010 1 次提交
  22. 27 7月, 2010 1 次提交
    • E
      ext4: don't print scary messages for allocation failures post-abort · e3570639
      Eric Sandeen 提交于
      I often get emails containing the "This should not happen!!" message,
      conveniently trimmed to remove things like:
      
      sd 0:0:0:0: [sda] Unhandled error code
      sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
      sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 03 13 c9 70 00 00 28 00
      end_request: I/O error, dev sda, sector 51628400
      Aborting journal on device dm-0-8.
      EXT4-fs error (device dm-0): ext4_journal_start_sb: Detected aborted journal
      EXT4-fs (dm-0): Remounting filesystem read-only
      
      I don't think there is any value to the verbosity if the reason is
      due to a filesystem abort; it just obfuscates the root cause.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e3570639