提交 · 6d40bc5a7e8fc71795d131e835f38f161ed7e1b1 · openanolis / cloud-kernel

27 7月, 2011 8 次提交

ext4: simplify journal handling in setup_new_group_blocks() · 6d40bc5a

由 Yongqiang Yang 提交于 7月 26, 2011

This patch simplifies journal handling in setup_new_group_blocks().

In previous code, block bitmap is modified everywhere in
setup_new_group_blocks(), ext4_get_write_access() in
extend_or_restart_transaction() is used to guarantee that the block
bitmap stays in the new handle, this makes things complicated.

The previous commit changed things so that the modifications on the
block bitmap are batched and done by ext4_set_bits() at the end of the
for loop.  This allows us to simplify things.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6d40bc5a

ext4: let setup_new_group_blocks() set multiple bits at a time · c3e94d1d

由 Yongqiang Yang 提交于 7月 26, 2011

Rename mb_set_bits() to ext4_set_bits() and make it a global function
so that setup_new_group_blocks() can use it.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c3e94d1d

ext4: fix a typo in ext4_group_extend() · 2b79b09d

由 Yongqiang Yang 提交于 7月 26, 2011

Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2b79b09d

ext4: let ext4_group_add_blocks() handle 0 blocks quickly · 4740b830

由 Yongqiang Yang 提交于 7月 26, 2011

If ext4_group_add_blocks() is called with 0 block, make it return 0
without doing any extra work.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4740b830

ext4: let ext4_group_add_blocks() return an error code · cc7365df

由 Yongqiang Yang 提交于 7月 26, 2011

This patch lets ext4_group_add_blocks() return an error code if it
fails, so that upper functions can handle error correctly.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

cc7365df

Y
ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks() · 0529155e
由 Yongqiang Yang 提交于 7月 26, 2011
```
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
0529155e

ext4: prevent a fs with errors from being resized · ce723c31

由 Yongqiang Yang 提交于 7月 26, 2011

A filesystem with errors is not allowed to being resized, otherwise,
it is easy to destroy the filesystem.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ce723c31

ext4: prevent parallel resizers by atomic bit ops · 8f82f840

由 Yongqiang Yang 提交于 7月 26, 2011

Before this patch, parallel resizers are allowed and protected by a
mutex lock, actually, there is no need to support parallel resizer, so
this patch prevents parallel resizers by atmoic bit ops, like
lock_page() and unlock_page() do.

To do this, the patch removed the mutex lock s_resize_lock from struct
ext4_sb_info and added a unsigned long field named s_resize_flags
which inidicates if there is a resizer.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8f82f840

26 7月, 2011 1 次提交

ext4: fix data corruption in inodes with journalled data · 2d859db3

由 Jan Kara 提交于 7月 26, 2011

When journalling data for an inode (either because it is a symlink or
because the filesystem is mounted in data=journal mode), ext4_evict_inode()
can discard unwritten data by calling truncate_inode_pages(). This is
because we don't mark the buffer / page dirty when journalling data but only
add the buffer to the running transaction and thus mm does not know there
are still unwritten data.

Fix the problem by carefully tracking transaction containing inode's data,
committing this transaction, and writing uncheckpointed buffers when inode
should be reaped.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2d859db3

24 7月, 2011 6 次提交

ext4: correct comment for ext4_ext_check_cache · b7ca1e8e

由 Robin Dong 提交于 7月 23, 2011

The comment for ext4_ext_check_cache has a litte mistake.
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b7ca1e8e

ext4: correct the debug message in ext4_ext_insert_extent · 0737964b

由 Robin Dong 提交于 7月 23, 2011

The debug message in ext4_ext_insert_extent before moving extent
is incorrect (the "from xx to xx").
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0737964b

ext4: remove unused argument in ext4_ext_next_leaf_block · 5718789d

由 Robin Dong 提交于 7月 23, 2011

The argument "inode" in function ext4_ext_next_allocated_block looks useless,
so clean it.
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5718789d

ext4: remove ac_repeats from ext4_allocation_context · 6a0fe493

由 Tao Ma 提交于 7月 23, 2011

ac_repeats isn't referenced in the mballoc code. So remove it.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6a0fe493

ext4: don't increment s_mb_buddies_generated in ext4_mb_release · ced156e4

由 Tao Ma 提交于 7月 23, 2011

In ext4_mb_release, we use s_mb_buddies_generated++.  Although
the output is OK, but I don't think we need this extra ++.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ced156e4

ext4: remove unnecessary ext4_get_group_info in ext4_mb_load_buddy · 529da704

由 Tao Ma 提交于 7月 23, 2011

ext4_mb_load_buddy() calls ext4_get_group_info() for setting both
"grp" and "e4b->bd_info", but it could do "e4b->bd_info = grp".
Reported-by: NAndreas Dilger <adilger@whamcloud.com>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

529da704

18 7月, 2011 6 次提交

ext4: avoid eh_entries overflow before insert extent_idx · d4620315

由 Robin Dong 提交于 7月 17, 2011

If eh_entries is equal to (or greater than) eh_max, the operation of
inserting new extent_idx will make number of entries overflow.
So check eh_entries before inserting the new extent_idx.

Although there is no bug case according the code (function
ext4_ext_insert_index is called by ext4_ext_split and ext4_ext_split
is called only if the index block has free space), the right logic
should be "lookup the capacity before insertion".
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d4620315

ext4: avoid wasted extent cache lookup if !PUNCH_OUT_EXT · 015861ba

由 Robin Dong 提交于 7月 17, 2011

This patch avoids an extraneous lookup of the extent cache
in ext4_ext_map_blocks() when the flag
EXT4_GET_BLOCKS_PUNCH_OUT_EXT is absent.

The existing logic was performing the lookup but not making
use of the result. The patch simply reverses the order of evaluation
in the condition.

Since ext4_ext_in_cache() does not initialize newex on misses, bypassing
its invocation does not introduce any new issue in this regard.
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NEric Gouriou <egouriou@google.com>

015861ba

ext4: remove unneeded parameter to ext4_ext_remove_space() · c6a0371c

由 Allison Henderson 提交于 7月 17, 2011

This patch removes the extra parameter in ext4_ext_remove_space()
which is no longer needed.
Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c6a0371c

ext4: punch hole optimizations: skip un-needed extent lookup · f7d0d379

由 Allison Henderson 提交于 7月 17, 2011

This patch optimizes the punch hole operation by skipping the
tree walking code that is used by truncate.  Since punch hole
is done through map blocks, the path to the extent is already
known in this function, so we do not need to look it up again.
Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f7d0d379

ext4: ignore a stripe width of 1 · 3eb08658

由 Dan Ehrenberg 提交于 7月 17, 2011

If the stripe width was set to 1, then this patch will ignore
that stripe width and ext4 will act as if the stripe width
were 0 with respect to optimizing allocations.
Signed-off-by: NDan Ehrenberg <dehrenberg@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3eb08658

ext4: make the preallocation size be a multiple of stripe size · d7a1fee1

由 Dan Ehrenberg 提交于 7月 17, 2011

Previously, if a stripe width was provided, then it would be used
as the preallocation granularity, with no santiy checking and no
way to override this. Now, mb_prealloc_size defaults to the smallest
multiple of stripe size that is greater than or equal to the old
default mb_prealloc_size, and this can be overridden with the sysfs
interface.
Signed-off-by: NDan Ehrenberg <dehrenberg@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d7a1fee1

17 7月, 2011 1 次提交

ext4: fix compilation with -DDX_DEBUG · 265c6a0f

由 Bernd Schubert 提交于 7月 16, 2011

Compilation of ext4/namei.c brought up an error and warning messages
when compiled with -DDX_DEBUG
Signed-off-by: NBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

265c6a0f

12 7月, 2011 4 次提交

ext4: remove unnecessary comments in ext4_orphan_add() · afb86178

由 Lukas Czerner 提交于 7月 11, 2011

The comment from Al Viro about possible race in the ext4_orphan_add() is
not justified. There is no race possible as we always have either i_mutex
locked, or the inode can not be referenced from outside hence the
J_ASSERS should not be hit from the reason described in comment.

This commit replaces it with notion that we are holding i_mutex so it
should not be possible for i_nlink to be changed while waiting for
s_orphan_lock.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

afb86178

ext4: Fix a double free of sbi->s_group_info in ext4_mb_init_backend · caaf7a29

由 Tao Ma 提交于 7月 11, 2011

If we meet with an error in ext4_mb_add_groupinfo, we kfree
sbi->s_group_info[group >> EXT4_DESC_PER_BLOCK_BITS(sb)], but fail to
reset it to NULL. So the caller ext4_mb_init_backend will try to kfree
it again and causes a double free. So fix it by resetting it to NULL.

Some typo in comments of mballoc.c are also changed.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

caaf7a29

ext4: fix a race which could leak memory in ext4_groupinfo_create_slab() · 823ba01f

由 Tao Ma 提交于 7月 11, 2011

In ext4_groupinfo_create_slab, we create ext4_groupinfo_caches within
ext4_grpinfo_slab_create_mutex, but set it outside the lock, and there
does exist some case that we may create it twice and causes a memory
leak.  So set it before we call mutex_unlock.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

823ba01f

ext4: avoid unneeded ext4_ext_next_leaf_block() while inserting extents · 598dbdf2

由 Robin Dong 提交于 7月 11, 2011

Optimize ext4_ext_insert_extent() by avoiding
ext4_ext_next_leaf_block() when the result is not used/needed.
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

598dbdf2

11 7月, 2011 10 次提交

ext4: remove redundant goto in ext4_ext_insert_extent() · ffb505ff

由 Robin Dong 提交于 7月 11, 2011

If eh->eh_entries is smaller than eh->eh_max, the routine will
go to the "repeat" and then go to "has_space" directlly ,
since argument "depth" and "eh" are not even changed.

Therefore, goto "has_space" directly and remove redundant "repeat" tag.
Signed-off-by: NRobin Dong <sanbai@taobao.com>

ffb505ff

ext4: Change the wrong param comment for ext4_trim_all_free · 22612283

由 Tao Ma 提交于 7月 11, 2011

at ext4_trim_all_free() comment, there is no longer an @e4b parameter,
instead it is @group.
Reported-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

22612283

ext4: Speed up FITRIM by recording flags in ext4_group_info · 3d56b8d2

由 Tao Ma 提交于 7月 11, 2011

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state.

A simple test with my intel x25m ssd:
df -h shows:
/dev/sdb1              40G   21G   17G  56% /mnt/ext4
Block size:               4096

run the FITRIM with the following parameter:
range.start = 0;
range.len = UINT64_MAX;
range.minlen = 1048576;

without the patch:
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.505s
user	0m0.000s
sys	0m1.224s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.359s
user	0m0.000s
sys	0m1.178s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.228s
user	0m0.000s
sys	0m1.151s

with the patch:
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.625s
user	0m0.000s
sys	0m1.269s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s

A big improvement for the 2nd and 3rd run.

Even after I delete some big image files, it is still much
faster than iterating the whole disk.

[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m1.217s
user	0m0.000s
sys	0m0.196s

Cc: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: NAndreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3d56b8d2

ext4: Add new ext4 trim tracepoints · b3d4c2b1

由 Tao Ma 提交于 7月 11, 2011

Add ext4_trim_extent and ext4_trim_all_free.
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b3d4c2b1

ext4: speed up group trim with the right free block count · 169ddc3e

由 Tao Ma 提交于 7月 11, 2011

When we trim some free blocks in a group of ext4, we need to 
calculate the free blocks properly and check whether there are
enough freed blocks left for us to trim. Current solution will
only calculate free spaces if they are large for a trim which
isn't appropriate.

Let us see a small example:
a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
And minblocks is 1M.  With current solution, we have to iterate
the whole group since these 300k will never be subtracted from
1.5M.  But actually we should exit after we find the first 2
free spaces since the left 3 chunks only sum up to 900K if we
subtract the first 600K although they can't be trimed.
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

169ddc3e

ext4: fix trim length underflow with small trim length · 22f10457

由 Tao Ma 提交于 7月 10, 2011

In 0f0a25bf, we adjust 'len' with s_first_data_block - start, but
it could underflow in case blocksize=1K, fstrim_range.len=512 and
fstrim_range.start = 0. In this case, when we run the code:
len -= first_data_blk - start; len will be underflow to -1ULL.
In the end, although we are safe that last_group check later will limit
the trim to the whole volume, but that isn't what the user really want.

So this patch fix it. It also adds the check for 'start' like ext3 so that
we can break immediately if the start is invalid.

Cc: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

22f10457

ext4: add tracepoint for ext4_journal_start · 12706394

由 Theodore Ts'o 提交于 7月 10, 2011

This will help debug who is responsible for starting a jbd2 transaction.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

12706394

jbd2: remove jbd2_dev_to_name() from jbd2 tracepoints · 4862fd60

由 Theodore Ts'o 提交于 7月 10, 2011

Using function calls in TP_printk causes perf heartburn, so print the
MAJOR/MINOR device numbers instead.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4862fd60

ext4: free allocated and pre-allocated blocks when check_eofblocks_fl fails · 575a1d4b

由 Jiaying Zhang 提交于 7月 10, 2011

Upon corrupted inode or disk failures, we may fail after we already
allocate some blocks from the inode or take some blocks from the
inode's preallocation list, but before we successfully insert the
corresponding extent to the extent tree. In this case, we should free
any allocated blocks and discard the inode's preallocated blocks
because the entries in the inode's preallocation list may be in an
inconsistent state.
Signed-off-by: NJiaying Zhang <jiayingz@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

575a1d4b

ext4: fix i_blocks/quota accounting when extent insertion fails · 7132de74

由 Maxim Patlasov 提交于 7月 10, 2011

The current implementation of ext4_free_blocks() always calls
dquot_free_block This looks quite sensible in the most cases: blocks
to be freed are associated with inode and were accounted in quota and
i_blocks some time ago.

However, there is a case when blocks to free were not accounted by the
time calling ext4_free_blocks() yet:

1. delalloc is on, write_begin pre-allocated some space in quota
2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks()
3. then ext4_ext_map_blocks() gets an error (e.g.  ENOSPC) from
   ext4_ext_insert_extent() and calls ext4_free_blocks().

In this scenario, ext4_free_blocks() calls dquot_free_block() who, in
turn, decrements i_blocks for blocks which were not accounted yet (due
to delalloc) After clean umount, e2fsck reports something like:

> Inode 21, i_blocks is 5080, should be 5128.  Fix<y>?
because i_blocks was erroneously decremented as explained above.

The patch fixes the problem by passing the new flag
EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request
that the dquot_free_block() call be skipped.
Signed-off-by: NMaxim Patlasov <maxim.patlasov@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

7132de74

30 6月, 2011 1 次提交

ext4: remove loop around bio_alloc() · 275d3ba6

由 Theodore Ts'o 提交于 6月 29, 2011

These days, bio_alloc() is guaranteed to never fail (as long as nvecs
is less than BIO_MAX_PAGES), so we don't need the loop around the
struct bio allocation.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

275d3ba6

28 6月, 2011 3 次提交

ext4: quiet 'unused variables' compile warnings · 9331b626

由 Yongqiang Yang 提交于 6月 28, 2011

Unused variables was deleted.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9331b626

ext4: refactor duplicated block placement code · f86186b4

由 Eric Sandeen 提交于 6月 28, 2011

I found that ext4_ext_find_goal() and ext4_find_near()
share the same code for returning a coloured start block
based on i_block_group.

We can refactor this into a common function so that they
don't diverge in the future.

Thanks to adilger for suggesting the new function name.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f86186b4

ext4: move ext4_ind_* functions from inode.c to indirect.c · dae1e52c

由 Amir Goldstein 提交于 6月 27, 2011

This patch moves functions from inode.c to indirect.c.
The moved functions are ext4_ind_* functions and their helpers.
Functions called from inode.c are declared extern.
Signed-off-by: NAmir Goldstein <amir73il@users.sf.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

dae1e52c

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功