提交 · 59be8e7280c10fd8f078ba6dc2bcdc2b1453b6ab · bug2833 / cloud-kernel

31 7月, 2011 2 次提交

ext4: change umode_t in tracepoint headers to be an explicit __u16 · 59be8e72

由 Theodore Ts'o 提交于 7月 30, 2011

As requested by Al Viro, since umode_t may be changing to a u32 for
some architectures.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@ZenIV.linux.org.uk>

59be8e72

ext4: fix races in ext4_sync_parent() · d59729f4

由 Theodore Ts'o 提交于 7月 30, 2011

Fix problems if fsync() races against a rename of a parent directory
as pointed out by Al Viro in his own inimitable way:

>While we are at it, could somebody please explain what the hell is ext4
>doing in
>static int ext4_sync_parent(struct inode *inode)
>{
>        struct writeback_control wbc;
>        struct dentry *dentry = NULL;
>        int ret = 0;
>
>        while (inode && ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) {
>                ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY);
>                dentry = list_entry(inode->i_dentry.next,
>                                    struct dentry, d_alias);
>                if (!dentry || !dentry->d_parent || !dentry->d_parent->d_inode)
>                        break;
>                inode = dentry->d_parent->d_inode;
>                ret = sync_mapping_buffers(inode->i_mapping);
>                ...
>Note that dentry obviously can't be NULL there.  dentry->d_parent is never
>NULL.  And dentry->d_parent would better not be negative, for crying out
>loud!  What's worse, there's no guarantees that dentry->d_parent will
>remain our parent over that sync_mapping_buffers() *and* that inode won't
>just be freed under us (after rename() and memory pressure leading to
>eviction of what used to be our dentry->d_parent)......
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d59729f4

28 7月, 2011 5 次提交

ext4: Fix overflow caused by missing cast in ext4_fallocate() · 29ae07b7

由 Utako Kusaka 提交于 7月 27, 2011

The logical block number in map.l_blk is a __u32, and so before we
shift it left, by the block size, we neeed cast it to a 64-bit size.

Otherwise i_size can be corrupted on an ENOSPC.

# df -T /mnt/mp1
Filesystem    Type   1K-blocks      Used Available Use% Mounted on
/dev/sda6     ext4     9843276    153056   9190200   2% /mnt/mp1
# fallocate -o 0 -l 2199023251456 /mnt/mp1/testfile
fallocate: /mnt/mp1/testfile: fallocate failed: No space left on device
# stat /mnt/mp1/testfile
  File: `/mnt/mp1/testfile'
  Size: 4293656576	Blocks: 19380440   IO Block: 4096   regular file
Device: 806h/2054d	Inode: 12          Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2011-07-25 13:01:31.414490496 +0900
Modify: 2011-07-25 13:01:31.414490496 +0900
Change: 2011-07-25 13:01:31.454490495 +0900
Signed-off-by: NUtako Kusaka <u-kusaka@wm.jp.nec.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
--
 fs/ext4/extents.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

29ae07b7

ext4: add action of moving index in ext4_ext_rm_idx for Punch Hole · 0e1147b0

由 Robin Dong 提交于 7月 27, 2011

The old function ext4_ext_rm_idx is used only for truncate case
because it just remove last index in extent-index-block. When punching
hole, it usually needed to remove "middle" index, therefore we must
move indexes which after it forward.

(I create a file with 1 depth extent tree and punch hole in the middle
of it, the last index in index-block strangly gone, so I find out this
bug)
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0e1147b0

ext4: simplify parameters of reserve_backup_gdb() · 668f4dc5

由 Yongqiang Yang 提交于 7月 27, 2011

The reserve_backup_gdb() function only needs the block group number;
there's no need to pass a pointer to struct ext4_new_group_data to it.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>

668f4dc5

ext4: simplify parameters of add_new_gdb() · 2f919710

由 Yongqiang Yang 提交于 7月 27, 2011

add_new_gdb() only needs the block group number; there is no need to
pass a pointer to struct ext4_new_group_data to add_new_gdb().
Instead of filling in a pointer the struct buffer_head in
add_new_gdb(), it's simpler to have the caller fetch it from the
s_group_desc[] array.

[Fixed error path to handle the case where struct buffer_head *primary
 hasn't been set yet. -- Ted]
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2f919710

ext4: remove lock_buffer in bclean() and setup_new_group_blocks() · e6075e98

由 Yongqiang Yang 提交于 7月 27, 2011

There is no need to lock the buffers since no one else should be
touching these buffers besides the file system.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e6075e98

27 7月, 2011 8 次提交

ext4: simplify journal handling in setup_new_group_blocks() · 6d40bc5a

由 Yongqiang Yang 提交于 7月 26, 2011

This patch simplifies journal handling in setup_new_group_blocks().

In previous code, block bitmap is modified everywhere in
setup_new_group_blocks(), ext4_get_write_access() in
extend_or_restart_transaction() is used to guarantee that the block
bitmap stays in the new handle, this makes things complicated.

The previous commit changed things so that the modifications on the
block bitmap are batched and done by ext4_set_bits() at the end of the
for loop.  This allows us to simplify things.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6d40bc5a

ext4: let setup_new_group_blocks() set multiple bits at a time · c3e94d1d

由 Yongqiang Yang 提交于 7月 26, 2011

Rename mb_set_bits() to ext4_set_bits() and make it a global function
so that setup_new_group_blocks() can use it.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c3e94d1d

ext4: fix a typo in ext4_group_extend() · 2b79b09d

由 Yongqiang Yang 提交于 7月 26, 2011

Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2b79b09d

ext4: let ext4_group_add_blocks() handle 0 blocks quickly · 4740b830

由 Yongqiang Yang 提交于 7月 26, 2011

If ext4_group_add_blocks() is called with 0 block, make it return 0
without doing any extra work.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4740b830

ext4: let ext4_group_add_blocks() return an error code · cc7365df

由 Yongqiang Yang 提交于 7月 26, 2011

This patch lets ext4_group_add_blocks() return an error code if it
fails, so that upper functions can handle error correctly.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

cc7365df

Y
ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks() · 0529155e
由 Yongqiang Yang 提交于 7月 26, 2011
```
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
0529155e

ext4: prevent a fs with errors from being resized · ce723c31

由 Yongqiang Yang 提交于 7月 26, 2011

A filesystem with errors is not allowed to being resized, otherwise,
it is easy to destroy the filesystem.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ce723c31

ext4: prevent parallel resizers by atomic bit ops · 8f82f840

由 Yongqiang Yang 提交于 7月 26, 2011

Before this patch, parallel resizers are allowed and protected by a
mutex lock, actually, there is no need to support parallel resizer, so
this patch prevents parallel resizers by atmoic bit ops, like
lock_page() and unlock_page() do.

To do this, the patch removed the mutex lock s_resize_lock from struct
ext4_sb_info and added a unsigned long field named s_resize_flags
which inidicates if there is a resizer.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8f82f840

26 7月, 2011 1 次提交

ext4: fix data corruption in inodes with journalled data · 2d859db3

由 Jan Kara 提交于 7月 26, 2011

When journalling data for an inode (either because it is a symlink or
because the filesystem is mounted in data=journal mode), ext4_evict_inode()
can discard unwritten data by calling truncate_inode_pages(). This is
because we don't mark the buffer / page dirty when journalling data but only
add the buffer to the running transaction and thus mm does not know there
are still unwritten data.

Fix the problem by carefully tracking transaction containing inode's data,
committing this transaction, and writing uncheckpointed buffers when inode
should be reaped.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2d859db3

24 7月, 2011 6 次提交

ext4: correct comment for ext4_ext_check_cache · b7ca1e8e

由 Robin Dong 提交于 7月 23, 2011

The comment for ext4_ext_check_cache has a litte mistake.
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b7ca1e8e

ext4: correct the debug message in ext4_ext_insert_extent · 0737964b

由 Robin Dong 提交于 7月 23, 2011

The debug message in ext4_ext_insert_extent before moving extent
is incorrect (the "from xx to xx").
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0737964b

ext4: remove unused argument in ext4_ext_next_leaf_block · 5718789d

由 Robin Dong 提交于 7月 23, 2011

The argument "inode" in function ext4_ext_next_allocated_block looks useless,
so clean it.
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5718789d

ext4: remove ac_repeats from ext4_allocation_context · 6a0fe493

由 Tao Ma 提交于 7月 23, 2011

ac_repeats isn't referenced in the mballoc code. So remove it.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6a0fe493

ext4: don't increment s_mb_buddies_generated in ext4_mb_release · ced156e4

由 Tao Ma 提交于 7月 23, 2011

In ext4_mb_release, we use s_mb_buddies_generated++.  Although
the output is OK, but I don't think we need this extra ++.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ced156e4

ext4: remove unnecessary ext4_get_group_info in ext4_mb_load_buddy · 529da704

由 Tao Ma 提交于 7月 23, 2011

ext4_mb_load_buddy() calls ext4_get_group_info() for setting both
"grp" and "e4b->bd_info", but it could do "e4b->bd_info = grp".
Reported-by: NAndreas Dilger <adilger@whamcloud.com>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

529da704

18 7月, 2011 6 次提交

ext4: avoid eh_entries overflow before insert extent_idx · d4620315

由 Robin Dong 提交于 7月 17, 2011

If eh_entries is equal to (or greater than) eh_max, the operation of
inserting new extent_idx will make number of entries overflow.
So check eh_entries before inserting the new extent_idx.

Although there is no bug case according the code (function
ext4_ext_insert_index is called by ext4_ext_split and ext4_ext_split
is called only if the index block has free space), the right logic
should be "lookup the capacity before insertion".
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d4620315

ext4: avoid wasted extent cache lookup if !PUNCH_OUT_EXT · 015861ba

由 Robin Dong 提交于 7月 17, 2011

This patch avoids an extraneous lookup of the extent cache
in ext4_ext_map_blocks() when the flag
EXT4_GET_BLOCKS_PUNCH_OUT_EXT is absent.

The existing logic was performing the lookup but not making
use of the result. The patch simply reverses the order of evaluation
in the condition.

Since ext4_ext_in_cache() does not initialize newex on misses, bypassing
its invocation does not introduce any new issue in this regard.
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NEric Gouriou <egouriou@google.com>

015861ba

ext4: remove unneeded parameter to ext4_ext_remove_space() · c6a0371c

由 Allison Henderson 提交于 7月 17, 2011

This patch removes the extra parameter in ext4_ext_remove_space()
which is no longer needed.
Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c6a0371c

ext4: punch hole optimizations: skip un-needed extent lookup · f7d0d379

由 Allison Henderson 提交于 7月 17, 2011

This patch optimizes the punch hole operation by skipping the
tree walking code that is used by truncate.  Since punch hole
is done through map blocks, the path to the extent is already
known in this function, so we do not need to look it up again.
Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f7d0d379

ext4: ignore a stripe width of 1 · 3eb08658

由 Dan Ehrenberg 提交于 7月 17, 2011

If the stripe width was set to 1, then this patch will ignore
that stripe width and ext4 will act as if the stripe width
were 0 with respect to optimizing allocations.
Signed-off-by: NDan Ehrenberg <dehrenberg@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3eb08658

ext4: make the preallocation size be a multiple of stripe size · d7a1fee1

由 Dan Ehrenberg 提交于 7月 17, 2011

Previously, if a stripe width was provided, then it would be used
as the preallocation granularity, with no santiy checking and no
way to override this. Now, mb_prealloc_size defaults to the smallest
multiple of stripe size that is greater than or equal to the old
default mb_prealloc_size, and this can be overridden with the sysfs
interface.
Signed-off-by: NDan Ehrenberg <dehrenberg@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d7a1fee1

17 7月, 2011 1 次提交

ext4: fix compilation with -DDX_DEBUG · 265c6a0f

由 Bernd Schubert 提交于 7月 16, 2011

Compilation of ext4/namei.c brought up an error and warning messages
when compiled with -DDX_DEBUG
Signed-off-by: NBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

265c6a0f

12 7月, 2011 4 次提交

ext4: remove unnecessary comments in ext4_orphan_add() · afb86178

由 Lukas Czerner 提交于 7月 11, 2011

The comment from Al Viro about possible race in the ext4_orphan_add() is
not justified. There is no race possible as we always have either i_mutex
locked, or the inode can not be referenced from outside hence the
J_ASSERS should not be hit from the reason described in comment.

This commit replaces it with notion that we are holding i_mutex so it
should not be possible for i_nlink to be changed while waiting for
s_orphan_lock.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

afb86178

ext4: Fix a double free of sbi->s_group_info in ext4_mb_init_backend · caaf7a29

由 Tao Ma 提交于 7月 11, 2011

If we meet with an error in ext4_mb_add_groupinfo, we kfree
sbi->s_group_info[group >> EXT4_DESC_PER_BLOCK_BITS(sb)], but fail to
reset it to NULL. So the caller ext4_mb_init_backend will try to kfree
it again and causes a double free. So fix it by resetting it to NULL.

Some typo in comments of mballoc.c are also changed.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

caaf7a29

ext4: fix a race which could leak memory in ext4_groupinfo_create_slab() · 823ba01f

由 Tao Ma 提交于 7月 11, 2011

In ext4_groupinfo_create_slab, we create ext4_groupinfo_caches within
ext4_grpinfo_slab_create_mutex, but set it outside the lock, and there
does exist some case that we may create it twice and causes a memory
leak.  So set it before we call mutex_unlock.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

823ba01f

ext4: avoid unneeded ext4_ext_next_leaf_block() while inserting extents · 598dbdf2

由 Robin Dong 提交于 7月 11, 2011

Optimize ext4_ext_insert_extent() by avoiding
ext4_ext_next_leaf_block() when the result is not used/needed.
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

598dbdf2

11 7月, 2011 7 次提交

ext4: remove redundant goto in ext4_ext_insert_extent() · ffb505ff

由 Robin Dong 提交于 7月 11, 2011

If eh->eh_entries is smaller than eh->eh_max, the routine will
go to the "repeat" and then go to "has_space" directlly ,
since argument "depth" and "eh" are not even changed.

Therefore, goto "has_space" directly and remove redundant "repeat" tag.
Signed-off-by: NRobin Dong <sanbai@taobao.com>

ffb505ff

ext4: Change the wrong param comment for ext4_trim_all_free · 22612283

由 Tao Ma 提交于 7月 11, 2011

at ext4_trim_all_free() comment, there is no longer an @e4b parameter,
instead it is @group.
Reported-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

22612283

ext4: Speed up FITRIM by recording flags in ext4_group_info · 3d56b8d2

由 Tao Ma 提交于 7月 11, 2011

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state.

A simple test with my intel x25m ssd:
df -h shows:
/dev/sdb1              40G   21G   17G  56% /mnt/ext4
Block size:               4096

run the FITRIM with the following parameter:
range.start = 0;
range.len = UINT64_MAX;
range.minlen = 1048576;

without the patch:
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.505s
user	0m0.000s
sys	0m1.224s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.359s
user	0m0.000s
sys	0m1.178s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.228s
user	0m0.000s
sys	0m1.151s

with the patch:
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.625s
user	0m0.000s
sys	0m1.269s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s

A big improvement for the 2nd and 3rd run.

Even after I delete some big image files, it is still much
faster than iterating the whole disk.

[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m1.217s
user	0m0.000s
sys	0m0.196s

Cc: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: NAndreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3d56b8d2

ext4: Add new ext4 trim tracepoints · b3d4c2b1

由 Tao Ma 提交于 7月 11, 2011

Add ext4_trim_extent and ext4_trim_all_free.
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b3d4c2b1

ext4: speed up group trim with the right free block count · 169ddc3e

由 Tao Ma 提交于 7月 11, 2011

When we trim some free blocks in a group of ext4, we need to 
calculate the free blocks properly and check whether there are
enough freed blocks left for us to trim. Current solution will
only calculate free spaces if they are large for a trim which
isn't appropriate.

Let us see a small example:
a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
And minblocks is 1M.  With current solution, we have to iterate
the whole group since these 300k will never be subtracted from
1.5M.  But actually we should exit after we find the first 2
free spaces since the left 3 chunks only sum up to 900K if we
subtract the first 600K although they can't be trimed.
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

169ddc3e

ext4: fix trim length underflow with small trim length · 22f10457

由 Tao Ma 提交于 7月 10, 2011

In 0f0a25bf, we adjust 'len' with s_first_data_block - start, but
it could underflow in case blocksize=1K, fstrim_range.len=512 and
fstrim_range.start = 0. In this case, when we run the code:
len -= first_data_blk - start; len will be underflow to -1ULL.
In the end, although we are safe that last_group check later will limit
the trim to the whole volume, but that isn't what the user really want.

So this patch fix it. It also adds the check for 'start' like ext3 so that
we can break immediately if the start is invalid.

Cc: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

22f10457

ext4: add tracepoint for ext4_journal_start · 12706394

由 Theodore Ts'o 提交于 7月 10, 2011

This will help debug who is responsible for starting a jbd2 transaction.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

12706394

bug2833 / cloud-kernel 与 Fork 源项目一致

bug2833 / cloud-kernel
与 Fork 源项目一致