提交 · 3eb08658431abd65c0fe6855d1860859c2d416f7 · openeuler / raspberrypi-kernel

18 7月, 2011 2 次提交

ext4: ignore a stripe width of 1 · 3eb08658

由 Dan Ehrenberg 提交于 7月 17, 2011

If the stripe width was set to 1, then this patch will ignore
that stripe width and ext4 will act as if the stripe width
were 0 with respect to optimizing allocations.
Signed-off-by: NDan Ehrenberg <dehrenberg@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3eb08658

ext4: make the preallocation size be a multiple of stripe size · d7a1fee1

由 Dan Ehrenberg 提交于 7月 17, 2011

Previously, if a stripe width was provided, then it would be used
as the preallocation granularity, with no santiy checking and no
way to override this. Now, mb_prealloc_size defaults to the smallest
multiple of stripe size that is greater than or equal to the old
default mb_prealloc_size, and this can be overridden with the sysfs
interface.
Signed-off-by: NDan Ehrenberg <dehrenberg@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d7a1fee1

17 7月, 2011 1 次提交

ext4: fix compilation with -DDX_DEBUG · 265c6a0f

由 Bernd Schubert 提交于 7月 16, 2011

Compilation of ext4/namei.c brought up an error and warning messages
when compiled with -DDX_DEBUG
Signed-off-by: NBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

265c6a0f

12 7月, 2011 4 次提交

ext4: remove unnecessary comments in ext4_orphan_add() · afb86178

由 Lukas Czerner 提交于 7月 11, 2011

The comment from Al Viro about possible race in the ext4_orphan_add() is
not justified. There is no race possible as we always have either i_mutex
locked, or the inode can not be referenced from outside hence the
J_ASSERS should not be hit from the reason described in comment.

This commit replaces it with notion that we are holding i_mutex so it
should not be possible for i_nlink to be changed while waiting for
s_orphan_lock.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

afb86178

ext4: Fix a double free of sbi->s_group_info in ext4_mb_init_backend · caaf7a29

由 Tao Ma 提交于 7月 11, 2011

If we meet with an error in ext4_mb_add_groupinfo, we kfree
sbi->s_group_info[group >> EXT4_DESC_PER_BLOCK_BITS(sb)], but fail to
reset it to NULL. So the caller ext4_mb_init_backend will try to kfree
it again and causes a double free. So fix it by resetting it to NULL.

Some typo in comments of mballoc.c are also changed.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

caaf7a29

ext4: fix a race which could leak memory in ext4_groupinfo_create_slab() · 823ba01f

由 Tao Ma 提交于 7月 11, 2011

In ext4_groupinfo_create_slab, we create ext4_groupinfo_caches within
ext4_grpinfo_slab_create_mutex, but set it outside the lock, and there
does exist some case that we may create it twice and causes a memory
leak.  So set it before we call mutex_unlock.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

823ba01f

ext4: avoid unneeded ext4_ext_next_leaf_block() while inserting extents · 598dbdf2

由 Robin Dong 提交于 7月 11, 2011

Optimize ext4_ext_insert_extent() by avoiding
ext4_ext_next_leaf_block() when the result is not used/needed.
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

598dbdf2

11 7月, 2011 10 次提交

ext4: remove redundant goto in ext4_ext_insert_extent() · ffb505ff

由 Robin Dong 提交于 7月 11, 2011

If eh->eh_entries is smaller than eh->eh_max, the routine will
go to the "repeat" and then go to "has_space" directlly ,
since argument "depth" and "eh" are not even changed.

Therefore, goto "has_space" directly and remove redundant "repeat" tag.
Signed-off-by: NRobin Dong <sanbai@taobao.com>

ffb505ff

ext4: Change the wrong param comment for ext4_trim_all_free · 22612283

由 Tao Ma 提交于 7月 11, 2011

at ext4_trim_all_free() comment, there is no longer an @e4b parameter,
instead it is @group.
Reported-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

22612283

ext4: Speed up FITRIM by recording flags in ext4_group_info · 3d56b8d2

由 Tao Ma 提交于 7月 11, 2011

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state.

A simple test with my intel x25m ssd:
df -h shows:
/dev/sdb1              40G   21G   17G  56% /mnt/ext4
Block size:               4096

run the FITRIM with the following parameter:
range.start = 0;
range.len = UINT64_MAX;
range.minlen = 1048576;

without the patch:
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.505s
user	0m0.000s
sys	0m1.224s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.359s
user	0m0.000s
sys	0m1.178s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.228s
user	0m0.000s
sys	0m1.151s

with the patch:
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.625s
user	0m0.000s
sys	0m1.269s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s

A big improvement for the 2nd and 3rd run.

Even after I delete some big image files, it is still much
faster than iterating the whole disk.

[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m1.217s
user	0m0.000s
sys	0m0.196s

Cc: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: NAndreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3d56b8d2

ext4: Add new ext4 trim tracepoints · b3d4c2b1

由 Tao Ma 提交于 7月 11, 2011

Add ext4_trim_extent and ext4_trim_all_free.
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b3d4c2b1

ext4: speed up group trim with the right free block count · 169ddc3e

由 Tao Ma 提交于 7月 11, 2011

When we trim some free blocks in a group of ext4, we need to 
calculate the free blocks properly and check whether there are
enough freed blocks left for us to trim. Current solution will
only calculate free spaces if they are large for a trim which
isn't appropriate.

Let us see a small example:
a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
And minblocks is 1M.  With current solution, we have to iterate
the whole group since these 300k will never be subtracted from
1.5M.  But actually we should exit after we find the first 2
free spaces since the left 3 chunks only sum up to 900K if we
subtract the first 600K although they can't be trimed.
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

169ddc3e

ext4: fix trim length underflow with small trim length · 22f10457

由 Tao Ma 提交于 7月 10, 2011

In 0f0a25bf, we adjust 'len' with s_first_data_block - start, but
it could underflow in case blocksize=1K, fstrim_range.len=512 and
fstrim_range.start = 0. In this case, when we run the code:
len -= first_data_blk - start; len will be underflow to -1ULL.
In the end, although we are safe that last_group check later will limit
the trim to the whole volume, but that isn't what the user really want.

So this patch fix it. It also adds the check for 'start' like ext3 so that
we can break immediately if the start is invalid.

Cc: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

22f10457

ext4: add tracepoint for ext4_journal_start · 12706394

由 Theodore Ts'o 提交于 7月 10, 2011

This will help debug who is responsible for starting a jbd2 transaction.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

12706394

jbd2: remove jbd2_dev_to_name() from jbd2 tracepoints · 4862fd60

由 Theodore Ts'o 提交于 7月 10, 2011

Using function calls in TP_printk causes perf heartburn, so print the
MAJOR/MINOR device numbers instead.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4862fd60

ext4: free allocated and pre-allocated blocks when check_eofblocks_fl fails · 575a1d4b

由 Jiaying Zhang 提交于 7月 10, 2011

Upon corrupted inode or disk failures, we may fail after we already
allocate some blocks from the inode or take some blocks from the
inode's preallocation list, but before we successfully insert the
corresponding extent to the extent tree. In this case, we should free
any allocated blocks and discard the inode's preallocated blocks
because the entries in the inode's preallocation list may be in an
inconsistent state.
Signed-off-by: NJiaying Zhang <jiayingz@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

575a1d4b

ext4: fix i_blocks/quota accounting when extent insertion fails · 7132de74

由 Maxim Patlasov 提交于 7月 10, 2011

The current implementation of ext4_free_blocks() always calls
dquot_free_block This looks quite sensible in the most cases: blocks
to be freed are associated with inode and were accounted in quota and
i_blocks some time ago.

However, there is a case when blocks to free were not accounted by the
time calling ext4_free_blocks() yet:

1. delalloc is on, write_begin pre-allocated some space in quota
2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks()
3. then ext4_ext_map_blocks() gets an error (e.g.  ENOSPC) from
   ext4_ext_insert_extent() and calls ext4_free_blocks().

In this scenario, ext4_free_blocks() calls dquot_free_block() who, in
turn, decrements i_blocks for blocks which were not accounted yet (due
to delalloc) After clean umount, e2fsck reports something like:

> Inode 21, i_blocks is 5080, should be 5128.  Fix<y>?
because i_blocks was erroneously decremented as explained above.

The patch fixes the problem by passing the new flag
EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request
that the dquot_free_block() call be skipped.
Signed-off-by: NMaxim Patlasov <maxim.patlasov@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

7132de74

30 6月, 2011 1 次提交

ext4: remove loop around bio_alloc() · 275d3ba6

由 Theodore Ts'o 提交于 6月 29, 2011

These days, bio_alloc() is guaranteed to never fail (as long as nvecs
is less than BIO_MAX_PAGES), so we don't need the loop around the
struct bio allocation.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

275d3ba6

28 6月, 2011 9 次提交

ext4: quiet 'unused variables' compile warnings · 9331b626

由 Yongqiang Yang 提交于 6月 28, 2011

Unused variables was deleted.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9331b626

ext4: refactor duplicated block placement code · f86186b4

由 Eric Sandeen 提交于 6月 28, 2011

I found that ext4_ext_find_goal() and ext4_find_near()
share the same code for returning a coloured start block
based on i_block_group.

We can refactor this into a common function so that they
don't diverge in the future.

Thanks to adilger for suggesting the new function name.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f86186b4

ext4: move ext4_ind_* functions from inode.c to indirect.c · dae1e52c

由 Amir Goldstein 提交于 6月 27, 2011

This patch moves functions from inode.c to indirect.c.
The moved functions are ext4_ind_* functions and their helpers.
Functions called from inode.c are declared extern.
Signed-off-by: NAmir Goldstein <amir73il@users.sf.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

dae1e52c

ext4: move common truncate functions to header file · 9f125d64

由 Theodore Ts'o 提交于 6月 27, 2011

Move two functions that will be needed by the indirect functions to be
moved to indirect.c as well as inode.c to truncate.h as inline
functions, so that we can avoid having duplicate copies of the
function (which can be a maintenance problem) without having to expose
them as globally functions.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9f125d64

ext4: move __ext4_check_blockref to block_validity.c · 1f7d1e77

由 Theodore Ts'o 提交于 6月 27, 2011

In preparation for moving the indirect functions to a separate file,
move __ext4_check_blockref() to block_validity.c and rename it to
ext4_check_blockref() which is exported as globally visible function.

Also, rename the cpp macro ext4_check_inode_blockref() to
ext4_ind_check_inode(), to make it clear that it is only valid for use
with non-extent mapped inodes.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1f7d1e77

ext4: rename ext4_indirect_* funcs to ext4_ind_* · 8bb2b247

由 Amir Goldstein 提交于 6月 27, 2011

We are going to move all ext4_ind_* functions to indirect.c.
Before we do that, let's rename 2 functions called ext4_indirect_*
to ext4_ind_*, to keep to the naming convention.
Signed-off-by: NAmir Goldstein <amir73il@users.sf.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8bb2b247

ext4: split ext4_ind_truncate from ext4_truncate · ff9893dc

由 Amir Goldstein 提交于 6月 27, 2011

We are about to move all indirect inode functions to a new file.
Before we do that, let's split ext4_ind_truncate() out of ext4_truncate()
leaving only generic code in the latter, so we will be able to move
ext4_ind_truncate() to the new file.
Signed-off-by: NAmir Goldstein <amir73il@users.sf.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ff9893dc

ext4: fix incorrect error msg in ext4_ext_insert_index · ed7a7e16

由 Robin Dong 提交于 6月 27, 2011

In function ext4_ext_insert_index when eh_entries of curp is
bigger than eh_max, error messages will be printed out, but the content
is about logical and ei_block, that's incorret.
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ed7a7e16

jbd2: use WRITE_SYNC in journal checkpoint · d3ad8434

由 Tao Ma 提交于 6月 27, 2011

In journal checkpoint, we write the buffer and wait for its finish.
But in cfq, the async queue has a very low priority, and in our test,
if there are too many sync queues and every queue is filled up with
requests, the write request will be delayed for quite a long time and
all the tasks which are waiting for journal space will end with errors like:

INFO: task attr_set:3816 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
attr_set      D ffff880028393480     0  3816      1 0x00000000
 ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
 ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
 ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
Call Trace:
 [<ffffffff8103e456>] ? __dequeue_entity+0x33/0x38
 [<ffffffff8103caad>] ? need_resched+0x23/0x2d
 [<ffffffff814006a6>] ? thread_return+0xa2/0xbc
 [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
 [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
 [<ffffffff81400d31>] __mutex_lock_common+0x14e/0x1a9
 [<ffffffffa021dbfb>] ? brelse+0x13/0x15 [ext4]
 [<ffffffff81400ddb>] __mutex_lock_slowpath+0x19/0x1b
 [<ffffffff81400b2d>] mutex_lock+0x1b/0x32
 [<ffffffffa01f927b>] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
 [<ffffffffa01f547b>] start_this_handle+0x438/0x527 [jbd2]
 [<ffffffff8106f491>] ? autoremove_wake_function+0x0/0x3e
 [<ffffffffa01f560b>] jbd2_journal_start+0xa1/0xcc [jbd2]
 [<ffffffffa02353be>] ext4_journal_start_sb+0x57/0x81 [ext4]
 [<ffffffffa024a314>] ext4_xattr_set+0x6c/0xe3 [ext4]
 [<ffffffffa024aaff>] ext4_xattr_user_set+0x42/0x4b [ext4]
 [<ffffffff81145adb>] generic_setxattr+0x6b/0x76
 [<ffffffff81146ac0>] __vfs_setxattr_noperm+0x47/0xc0
 [<ffffffff81146bb8>] vfs_setxattr+0x7f/0x9a
 [<ffffffff81146c88>] setxattr+0xb5/0xe8
 [<ffffffff81137467>] ? do_filp_open+0x571/0xa6e
 [<ffffffff81146d26>] sys_fsetxattr+0x6b/0x91
 [<ffffffff81002d32>] system_call_fastpath+0x16/0x1b

So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
be moved into sync queue and handled by cfq timely. We also use the new plug,
sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Reported-by: NRobin Dong <sanbai@taobao.com>

d3ad8434

20 6月, 2011 12 次提交

fix comment in generic_permission() · 8e833fd2

由 Al Viro 提交于 6月 19, 2011

CAP_DAC_OVERRIDE is enough for MAY_EXEC on directory, even if
no exec bits are set.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8e833fd2

A
kill obsolete comment for follow_down() · 6291176b
由 Al Viro 提交于 6月 17, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6291176b

proc_sys_permission() is OK in RCU mode · 1aec7036

由 Al Viro 提交于 6月 18, 2011

nothing blocking there, since all instances of sysctl
->permissions() method are non-blocking - both of them,
that is.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1aec7036

reiserfs_permission() doesn't need to bail out in RCU mode · 1d29b5a2

由 Al Viro 提交于 6月 18, 2011

nothing blocking other than generic_permission() (and
check_acl callback does bail out in RCU mode).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1d29b5a2

A
proc_fd_permission() is doesn't need to bail out in RCU mode · cf127911
由 Al Viro 提交于 6月 18, 2011
```
nothing blocking except generic_permission()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
cf127911

nilfs2_permission() doesn't need to bail out in RCU mode · 730e908f

由 Al Viro 提交于 6月 18, 2011

Nothing blocking except for generic_permission().  Which will DTRT.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

730e908f

logfs doesn't need ->permission() at all · a63ab94d

由 Al Viro 提交于 6月 18, 2011

... and never did, what with its ->permission() being what we do by default
when ->permission is NULL...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a63ab94d

coda_ioctl_permission() is safe in RCU mode · 6b419951

由 Al Viro 提交于 6月 18, 2011

return (mask & MAY_EXEC) ? -EACCES : 0; is non-blocking...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6b419951

cifs_permission() doesn't need to bail out in RCU mode · ec12781f

由 Al Viro 提交于 6月 18, 2011

nothing potentially blocking except generic_permission(), which
will DTRT
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ec12781f

bad_inode_permission() is safe from RCU mode · 1712c20d

由 Al Viro 提交于 6月 18, 2011

return -EIO; is *not* a blocking operation, thank you very much.
Nick, what the hell have you been smoking?
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1712c20d

ubifs: dereferencing an ERR_PTR in ubifs_mount() · 185bf873

由 Dan Carpenter 提交于 6月 20, 2011

d251ed27 "ubifs: fix sget races" left out the goto from this
error path so the static checkers complain that we're dereferencing
"sb" when it's an ERR_PTR.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

185bf873

nfsd4: fix break_lease flags on nfsd open · 105f4622

由 J. Bruce Fields 提交于 6月 07, 2011

Thanks to Casey Bodley for pointing out that on a read open we pass 0,
instead of O_RDONLY, to break_lease, with the result that a read open is
treated like a write open for the purposes of lease breaking!
Reported-by: NCasey Bodley <cbodley@citi.umich.edu>
Cc: stable@kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

105f4622

18 6月, 2011 1 次提交

isofs: fix bh leak in isofs_fill_super() error case · c11760c6

由 Linus Torvalds 提交于 6月 08, 2011

In isofs_fill_super(), when an iso_primary_descriptor is found, it is
kept in pri_bh. The error cases don't properly release it. Fix it.
Reported-and-tested-by: N김원석 <stanley.will.kim@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c11760c6