1. 26 7月, 2011 1 次提交
    • J
      ext4: fix data corruption in inodes with journalled data · 2d859db3
      Jan Kara 提交于
      When journalling data for an inode (either because it is a symlink or
      because the filesystem is mounted in data=journal mode), ext4_evict_inode()
      can discard unwritten data by calling truncate_inode_pages(). This is
      because we don't mark the buffer / page dirty when journalling data but only
      add the buffer to the running transaction and thus mm does not know there
      are still unwritten data.
      
      Fix the problem by carefully tracking transaction containing inode's data,
      committing this transaction, and writing uncheckpointed buffers when inode
      should be reaped.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2d859db3
  2. 24 7月, 2011 6 次提交
  3. 18 7月, 2011 6 次提交
  4. 17 7月, 2011 1 次提交
  5. 12 7月, 2011 4 次提交
  6. 11 7月, 2011 10 次提交
    • R
      ext4: remove redundant goto in ext4_ext_insert_extent() · ffb505ff
      Robin Dong 提交于
      If eh->eh_entries is smaller than eh->eh_max, the routine will
      go to the "repeat" and then go to "has_space" directlly ,
      since argument "depth" and "eh" are not even changed.
      
      Therefore, goto "has_space" directly and remove redundant "repeat" tag.
      Signed-off-by: NRobin Dong <sanbai@taobao.com>
      ffb505ff
    • T
      ext4: Change the wrong param comment for ext4_trim_all_free · 22612283
      Tao Ma 提交于
      at ext4_trim_all_free() comment, there is no longer an @e4b parameter,
      instead it is @group.
      Reported-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      22612283
    • T
      ext4: Speed up FITRIM by recording flags in ext4_group_info · 3d56b8d2
      Tao Ma 提交于
      In ext4, when FITRIM is called every time, we iterate all the
      groups and do trim one by one. It is a bit time wasting if the
      group has been trimmed and there is no change since the last
      trim.
      
      So this patch adds a new flag in ext4_group_info->bb_state to
      indicate that the group has been trimmed, and it will be cleared
      if some blocks is freed(in release_blocks_on_commit). Another
      trim_minlen is added in ext4_sb_info to record the last minlen
      we use to trim the volume, so that if the caller provide a small
      one, we will go on the trim regardless of the bb_state.
      
      A simple test with my intel x25m ssd:
      df -h shows:
      /dev/sdb1              40G   21G   17G  56% /mnt/ext4
      Block size:               4096
      
      run the FITRIM with the following parameter:
      range.start = 0;
      range.len = UINT64_MAX;
      range.minlen = 1048576;
      
      without the patch:
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.505s
      user	0m0.000s
      sys	0m1.224s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.359s
      user	0m0.000s
      sys	0m1.178s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.228s
      user	0m0.000s
      sys	0m1.151s
      
      with the patch:
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m5.625s
      user	0m0.000s
      sys	0m1.269s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m0.002s
      user	0m0.000s
      sys	0m0.001s
      [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
      real	0m0.002s
      user	0m0.000s
      sys	0m0.001s
      
      A big improvement for the 2nd and 3rd run.
      
      Even after I delete some big image files, it is still much
      faster than iterating the whole disk.
      
      [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
      real	0m1.217s
      user	0m0.000s
      sys	0m0.196s
      
      Cc: Lukas Czerner <lczerner@redhat.com>
      Reviewed-by: NAndreas Dilger <adilger.kernel@dilger.ca>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      3d56b8d2
    • T
      ext4: Add new ext4 trim tracepoints · b3d4c2b1
      Tao Ma 提交于
      Add ext4_trim_extent and ext4_trim_all_free.
      Reviewed-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b3d4c2b1
    • T
      ext4: speed up group trim with the right free block count · 169ddc3e
      Tao Ma 提交于
      When we trim some free blocks in a group of ext4, we need to 
      calculate the free blocks properly and check whether there are
      enough freed blocks left for us to trim. Current solution will
      only calculate free spaces if they are large for a trim which
      isn't appropriate.
      
      Let us see a small example:
      a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
      And minblocks is 1M.  With current solution, we have to iterate
      the whole group since these 300k will never be subtracted from
      1.5M.  But actually we should exit after we find the first 2
      free spaces since the left 3 chunks only sum up to 900K if we
      subtract the first 600K although they can't be trimed.
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      169ddc3e
    • T
      ext4: fix trim length underflow with small trim length · 22f10457
      Tao Ma 提交于
      In 0f0a25bf, we adjust 'len' with s_first_data_block - start, but
      it could underflow in case blocksize=1K, fstrim_range.len=512 and
      fstrim_range.start = 0. In this case, when we run the code:
      len -= first_data_blk - start; len will be underflow to -1ULL.
      In the end, although we are safe that last_group check later will limit
      the trim to the whole volume, but that isn't what the user really want.
      
      So this patch fix it. It also adds the check for 'start' like ext3 so that
      we can break immediately if the start is invalid.
      
      Cc: Lukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      22f10457
    • T
      ext4: add tracepoint for ext4_journal_start · 12706394
      Theodore Ts'o 提交于
      This will help debug who is responsible for starting a jbd2 transaction.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      12706394
    • T
      jbd2: remove jbd2_dev_to_name() from jbd2 tracepoints · 4862fd60
      Theodore Ts'o 提交于
      Using function calls in TP_printk causes perf heartburn, so print the
      MAJOR/MINOR device numbers instead.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4862fd60
    • J
      ext4: free allocated and pre-allocated blocks when check_eofblocks_fl fails · 575a1d4b
      Jiaying Zhang 提交于
      Upon corrupted inode or disk failures, we may fail after we already
      allocate some blocks from the inode or take some blocks from the
      inode's preallocation list, but before we successfully insert the
      corresponding extent to the extent tree. In this case, we should free
      any allocated blocks and discard the inode's preallocated blocks
      because the entries in the inode's preallocation list may be in an
      inconsistent state.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      575a1d4b
    • M
      ext4: fix i_blocks/quota accounting when extent insertion fails · 7132de74
      Maxim Patlasov 提交于
      The current implementation of ext4_free_blocks() always calls
      dquot_free_block This looks quite sensible in the most cases: blocks
      to be freed are associated with inode and were accounted in quota and
      i_blocks some time ago.
      
      However, there is a case when blocks to free were not accounted by the
      time calling ext4_free_blocks() yet:
      
      1. delalloc is on, write_begin pre-allocated some space in quota
      2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks()
      3. then ext4_ext_map_blocks() gets an error (e.g.  ENOSPC) from
         ext4_ext_insert_extent() and calls ext4_free_blocks().
      
      In this scenario, ext4_free_blocks() calls dquot_free_block() who, in
      turn, decrements i_blocks for blocks which were not accounted yet (due
      to delalloc) After clean umount, e2fsck reports something like:
      
      > Inode 21, i_blocks is 5080, should be 5128.  Fix<y>?
      because i_blocks was erroneously decremented as explained above.
      
      The patch fixes the problem by passing the new flag
      EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request
      that the dquot_free_block() call be skipped.
      Signed-off-by: NMaxim Patlasov <maxim.patlasov@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      7132de74
  7. 30 6月, 2011 1 次提交
  8. 28 6月, 2011 9 次提交
    • Y
      ext4: quiet 'unused variables' compile warnings · 9331b626
      Yongqiang Yang 提交于
      Unused variables was deleted.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9331b626
    • E
      ext4: refactor duplicated block placement code · f86186b4
      Eric Sandeen 提交于
      I found that ext4_ext_find_goal() and ext4_find_near()
      share the same code for returning a coloured start block
      based on i_block_group.
      
      We can refactor this into a common function so that they
      don't diverge in the future.
      
      Thanks to adilger for suggesting the new function name.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f86186b4
    • A
      ext4: move ext4_ind_* functions from inode.c to indirect.c · dae1e52c
      Amir Goldstein 提交于
      This patch moves functions from inode.c to indirect.c.
      The moved functions are ext4_ind_* functions and their helpers.
      Functions called from inode.c are declared extern.
      Signed-off-by: NAmir Goldstein <amir73il@users.sf.net>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      dae1e52c
    • T
      ext4: move common truncate functions to header file · 9f125d64
      Theodore Ts'o 提交于
      Move two functions that will be needed by the indirect functions to be
      moved to indirect.c as well as inode.c to truncate.h as inline
      functions, so that we can avoid having duplicate copies of the
      function (which can be a maintenance problem) without having to expose
      them as globally functions.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9f125d64
    • T
      ext4: move __ext4_check_blockref to block_validity.c · 1f7d1e77
      Theodore Ts'o 提交于
      In preparation for moving the indirect functions to a separate file,
      move __ext4_check_blockref() to block_validity.c and rename it to
      ext4_check_blockref() which is exported as globally visible function.
      
      Also, rename the cpp macro ext4_check_inode_blockref() to
      ext4_ind_check_inode(), to make it clear that it is only valid for use
      with non-extent mapped inodes.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      1f7d1e77
    • A
      ext4: rename ext4_indirect_* funcs to ext4_ind_* · 8bb2b247
      Amir Goldstein 提交于
      We are going to move all ext4_ind_* functions to indirect.c.
      Before we do that, let's rename 2 functions called ext4_indirect_*
      to ext4_ind_*, to keep to the naming convention.
      Signed-off-by: NAmir Goldstein <amir73il@users.sf.net>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8bb2b247
    • A
      ext4: split ext4_ind_truncate from ext4_truncate · ff9893dc
      Amir Goldstein 提交于
      We are about to move all indirect inode functions to a new file.
      Before we do that, let's split ext4_ind_truncate() out of ext4_truncate()
      leaving only generic code in the latter, so we will be able to move
      ext4_ind_truncate() to the new file.
      Signed-off-by: NAmir Goldstein <amir73il@users.sf.net>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ff9893dc
    • R
      ext4: fix incorrect error msg in ext4_ext_insert_index · ed7a7e16
      Robin Dong 提交于
      In function ext4_ext_insert_index when eh_entries of curp is
      bigger than eh_max, error messages will be printed out, but the content
      is about logical and ei_block, that's incorret.
      Signed-off-by: NRobin Dong <sanbai@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ed7a7e16
    • T
      jbd2: use WRITE_SYNC in journal checkpoint · d3ad8434
      Tao Ma 提交于
      In journal checkpoint, we write the buffer and wait for its finish.
      But in cfq, the async queue has a very low priority, and in our test,
      if there are too many sync queues and every queue is filled up with
      requests, the write request will be delayed for quite a long time and
      all the tasks which are waiting for journal space will end with errors like:
      
      INFO: task attr_set:3816 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      attr_set      D ffff880028393480     0  3816      1 0x00000000
       ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
       ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
       ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
      Call Trace:
       [<ffffffff8103e456>] ? __dequeue_entity+0x33/0x38
       [<ffffffff8103caad>] ? need_resched+0x23/0x2d
       [<ffffffff814006a6>] ? thread_return+0xa2/0xbc
       [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
       [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
       [<ffffffff81400d31>] __mutex_lock_common+0x14e/0x1a9
       [<ffffffffa021dbfb>] ? brelse+0x13/0x15 [ext4]
       [<ffffffff81400ddb>] __mutex_lock_slowpath+0x19/0x1b
       [<ffffffff81400b2d>] mutex_lock+0x1b/0x32
       [<ffffffffa01f927b>] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
       [<ffffffffa01f547b>] start_this_handle+0x438/0x527 [jbd2]
       [<ffffffff8106f491>] ? autoremove_wake_function+0x0/0x3e
       [<ffffffffa01f560b>] jbd2_journal_start+0xa1/0xcc [jbd2]
       [<ffffffffa02353be>] ext4_journal_start_sb+0x57/0x81 [ext4]
       [<ffffffffa024a314>] ext4_xattr_set+0x6c/0xe3 [ext4]
       [<ffffffffa024aaff>] ext4_xattr_user_set+0x42/0x4b [ext4]
       [<ffffffff81145adb>] generic_setxattr+0x6b/0x76
       [<ffffffff81146ac0>] __vfs_setxattr_noperm+0x47/0xc0
       [<ffffffff81146bb8>] vfs_setxattr+0x7f/0x9a
       [<ffffffff81146c88>] setxattr+0xb5/0xe8
       [<ffffffff81137467>] ? do_filp_open+0x571/0xa6e
       [<ffffffff81146d26>] sys_fsetxattr+0x6b/0x91
       [<ffffffff81002d32>] system_call_fastpath+0x16/0x1b
      
      So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
      be moved into sync queue and handled by cfq timely. We also use the new plug,
      sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Reported-by: NRobin Dong <sanbai@taobao.com>
      d3ad8434
  9. 22 6月, 2011 1 次提交
  10. 21 6月, 2011 1 次提交