提交 · 9dd75f1f1a02d656a11a7b9b9e6c2759b9c1e946 · OpenHarmony / kernel_linux

14 8月, 2011 3 次提交

ext4: fix nomblk_io_submit option so it correctly converts uninit blocks · 9dd75f1f

由 Theodore Ts'o 提交于 8月 13, 2011

Bug discovered by Jan Kara:

Finally, commit 1449032b returned back
the old IO submission code but apparently it forgot to return the old
handling of uninitialized buffers so we unconditionnaly call
block_write_full_page() without specifying end_io function. So AFAICS
we never convert unwritten extents to written in some cases. For
example when I mount the fs as: mount -t ext4 -o
nomblk_io_submit,dioread_nolock /dev/ubdb /mnt and do
        int fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0600);
        char buf[1024];
        memset(buf, 'a', sizeof(buf));
        fallocate(fd, 0, 0, 16384);
        write(fd, buf, sizeof(buf));

I get a file full of zeros (after remounting the filesystem so that
pagecache is dropped) instead of seeing the first KB contain 'a's.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

9dd75f1f

ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN. · 32c80b32

由 Tao Ma 提交于 8月 13, 2011

EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten
should be done simultaneously since ext4_end_io_nolock always clear
the flag and decrease the counter in the same time.

We don't increase i_aiodio_unwritten when setting
EXT4_IO_END_UNWRITTEN so it will go nagative and causes some process
to wait forever.

Part of the patch came from Eric in his e-mail, but it doesn't fix the
problem met by Michael actually.

http://marc.info/?l=linux-ext4&m=131316851417460&w=2

Reported-and-Tested-by: Michael Tokarev<mjt@tls.msk.ru>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

32c80b32

ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode · 2581fdc8

由 Jiaying Zhang 提交于 8月 13, 2011

Flush inode's i_completed_io_list before calling ext4_io_wait to
prevent the following deadlock scenario: A page fault happens while
some process is writing inode A. During page fault,
shrink_icache_memory is called that in turn evicts another inode
B. Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero. However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.

Also moves ext4_flush_completed_IO and ext4_ioend_wait from
ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion,
ext4_evict_inode() is called before ext4_destroy_inode() and in
ext4_evict_inode(), we may call ext4_truncate() without holding
i_mutex lock. As a result, there is a race between flush_completed_IO
that is called from ext4_ext_truncate() and ext4_end_io_work, which
may cause corruption on an io_end structure. This change moves
ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode()
to ext4_evict_inode() to resolve the race between ext4_truncate() and
ext4_end_io_work during inode deletion.
Signed-off-by: NJiaying Zhang <jiayingz@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

2581fdc8

13 8月, 2011 1 次提交

ext4: Fix ext4_should_writeback_data() for no-journal mode · 441c8508

由 Curt Wohlgemuth 提交于 8月 13, 2011

ext4_should_writeback_data() had an incorrect sequence of
tests to determine if it should return 0 or 1: in
particular, even in no-journal mode, 0 was being returned
for a non-regular-file inode.

This meant that, in non-journal mode, we would use
ext4_journalled_aops for directories, symlinks, and other
non-regular files.  However, calling journalled aop
callbacks when there is no valid handle, can cause problems.

This would cause a kernel crash with Jan Kara's commit
2d859db3 ("ext4: fix data corruption in inodes with
journalled data"), because we now dereference 'handle' in
ext4_journalled_write_end().

I also added BUG_ONs to check for a valid handle in the
obviously journal-only aops callbacks.

I tested this running xfstests with a scratch device in
these modes:

   - no-journal
   - data=ordered
   - data=writeback
   - data=journal

All work fine; the data=journal run has many failures and a
crash in xfstests 074, but this is no different from a
vanilla kernel.
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

441c8508

04 8月, 2011 1 次提交

ext4: use kzalloc in ext4_kzalloc() · db9481c0

由 Mathias Krause 提交于 8月 03, 2011

Commit 9933fc0ai (ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and
ext4_kvfree()) intruduced wrappers around k*alloc/vmalloc but introduced
a typo for ext4_kzalloc() by not using kzalloc() but kmalloc().
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

db9481c0

02 8月, 2011 3 次提交

ext4: prevent memory leaks from ext4_mb_init_backend() on error path · 79a77c5a

由 Yu Jian 提交于 8月 01, 2011

In ext4_mb_init(), if the s_locality_group allocation fails it will
currently cause the allocations made in ext4_mb_init_backend() to
be leaked.  Moving the ext4_mb_init_backend() allocation after the
s_locality_group allocation avoids that problem.
Signed-off-by: NYu Jian <yujian@whamcloud.com>
Signed-off-by: NAndreas Dilger <adilger@whamcloud.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

79a77c5a

ext4: use EXT4_BAD_INO for buddy cache to avoid colliding with valid inode # · 48e6061b

由 Yu Jian 提交于 8月 01, 2011

Signed-off-by: NYu Jian <yujian@whamcloud.com>
Signed-off-by: NAndreas Dilger <adilger@whamcloud.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

48e6061b

T
ext4: use ext4_msg() instead of printk in mballoc · 9d8b9ec4
由 Theodore Ts'o 提交于 8月 01, 2011
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
9d8b9ec4

01 8月, 2011 5 次提交

T
ext4: use ext4_kvzalloc()/ext4_kvmalloc() for s_group_desc and s_group_info · f18a5f21
由 Theodore Ts'o 提交于 8月 01, 2011
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
f18a5f21

ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree() · 9933fc0a

由 Theodore Ts'o 提交于 8月 01, 2011

Introduce new helper functions which try kmalloc, and then fall back
to vmalloc if necessary, and use them for allocating and deallocating
s_flex_groups.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9933fc0a

ext4: use the correct error exit path in ext4_init_inode_table() · 33853a0d

由 Yongqiang Yang 提交于 8月 01, 2011

This patch lets ext4_init_inode_table() handle errors right.
ext4_init_inode_table() should down_write() alloc_sem which
has been up_write()ed and stop the started journal handle.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

33853a0d

switch posix_acl_equiv_mode() to umode_t * · d6952123

由 Al Viro 提交于 7月 23, 2011

... so that &inode->i_mode could be passed to it
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d6952123

A
switch posix_acl_create() to umode_t * · d3fb6120
由 Al Viro 提交于 7月 23, 2011
```
so we can pass &inode->i_mode to it
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d3fb6120

31 7月, 2011 2 次提交

ext4: add missing kfree() on error return path in add_new_gdb() · c49bafa3

由 Dan Carpenter 提交于 7月 30, 2011

We added some more error handling in b4097142 "ext4: add error
checking to calls to ext4_handle_dirty_metadata()".  But we need to
call kfree() as well to avoid a memory leak.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c49bafa3

ext4: fix races in ext4_sync_parent() · d59729f4

由 Theodore Ts'o 提交于 7月 30, 2011

Fix problems if fsync() races against a rename of a parent directory
as pointed out by Al Viro in his own inimitable way:

>While we are at it, could somebody please explain what the hell is ext4
>doing in
>static int ext4_sync_parent(struct inode *inode)
>{
>        struct writeback_control wbc;
>        struct dentry *dentry = NULL;
>        int ret = 0;
>
>        while (inode && ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) {
>                ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY);
>                dentry = list_entry(inode->i_dentry.next,
>                                    struct dentry, d_alias);
>                if (!dentry || !dentry->d_parent || !dentry->d_parent->d_inode)
>                        break;
>                inode = dentry->d_parent->d_inode;
>                ret = sync_mapping_buffers(inode->i_mapping);
>                ...
>Note that dentry obviously can't be NULL there.  dentry->d_parent is never
>NULL.  And dentry->d_parent would better not be negative, for crying out
>loud!  What's worse, there's no guarantees that dentry->d_parent will
>remain our parent over that sync_mapping_buffers() *and* that inode won't
>just be freed under us (after rename() and memory pressure leading to
>eviction of what used to be our dentry->d_parent)......
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d59729f4

28 7月, 2011 5 次提交

ext4: Fix overflow caused by missing cast in ext4_fallocate() · 29ae07b7

由 Utako Kusaka 提交于 7月 27, 2011

The logical block number in map.l_blk is a __u32, and so before we
shift it left, by the block size, we neeed cast it to a 64-bit size.

Otherwise i_size can be corrupted on an ENOSPC.

# df -T /mnt/mp1
Filesystem    Type   1K-blocks      Used Available Use% Mounted on
/dev/sda6     ext4     9843276    153056   9190200   2% /mnt/mp1
# fallocate -o 0 -l 2199023251456 /mnt/mp1/testfile
fallocate: /mnt/mp1/testfile: fallocate failed: No space left on device
# stat /mnt/mp1/testfile
  File: `/mnt/mp1/testfile'
  Size: 4293656576	Blocks: 19380440   IO Block: 4096   regular file
Device: 806h/2054d	Inode: 12          Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2011-07-25 13:01:31.414490496 +0900
Modify: 2011-07-25 13:01:31.414490496 +0900
Change: 2011-07-25 13:01:31.454490495 +0900
Signed-off-by: NUtako Kusaka <u-kusaka@wm.jp.nec.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
--
 fs/ext4/extents.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

29ae07b7

ext4: add action of moving index in ext4_ext_rm_idx for Punch Hole · 0e1147b0

由 Robin Dong 提交于 7月 27, 2011

The old function ext4_ext_rm_idx is used only for truncate case
because it just remove last index in extent-index-block. When punching
hole, it usually needed to remove "middle" index, therefore we must
move indexes which after it forward.

(I create a file with 1 depth extent tree and punch hole in the middle
of it, the last index in index-block strangly gone, so I find out this
bug)
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0e1147b0

ext4: simplify parameters of reserve_backup_gdb() · 668f4dc5

由 Yongqiang Yang 提交于 7月 27, 2011

The reserve_backup_gdb() function only needs the block group number;
there's no need to pass a pointer to struct ext4_new_group_data to it.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>

668f4dc5

ext4: simplify parameters of add_new_gdb() · 2f919710

由 Yongqiang Yang 提交于 7月 27, 2011

add_new_gdb() only needs the block group number; there is no need to
pass a pointer to struct ext4_new_group_data to add_new_gdb().
Instead of filling in a pointer the struct buffer_head in
add_new_gdb(), it's simpler to have the caller fetch it from the
s_group_desc[] array.

[Fixed error path to handle the case where struct buffer_head *primary
 hasn't been set yet. -- Ted]
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2f919710

ext4: remove lock_buffer in bclean() and setup_new_group_blocks() · e6075e98

由 Yongqiang Yang 提交于 7月 27, 2011

There is no need to lock the buffers since no one else should be
touching these buffers besides the file system.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e6075e98

27 7月, 2011 8 次提交

ext4: simplify journal handling in setup_new_group_blocks() · 6d40bc5a

由 Yongqiang Yang 提交于 7月 26, 2011

This patch simplifies journal handling in setup_new_group_blocks().

In previous code, block bitmap is modified everywhere in
setup_new_group_blocks(), ext4_get_write_access() in
extend_or_restart_transaction() is used to guarantee that the block
bitmap stays in the new handle, this makes things complicated.

The previous commit changed things so that the modifications on the
block bitmap are batched and done by ext4_set_bits() at the end of the
for loop.  This allows us to simplify things.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6d40bc5a

ext4: let setup_new_group_blocks() set multiple bits at a time · c3e94d1d

由 Yongqiang Yang 提交于 7月 26, 2011

Rename mb_set_bits() to ext4_set_bits() and make it a global function
so that setup_new_group_blocks() can use it.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c3e94d1d

ext4: fix a typo in ext4_group_extend() · 2b79b09d

由 Yongqiang Yang 提交于 7月 26, 2011

Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2b79b09d

ext4: let ext4_group_add_blocks() handle 0 blocks quickly · 4740b830

由 Yongqiang Yang 提交于 7月 26, 2011

If ext4_group_add_blocks() is called with 0 block, make it return 0
without doing any extra work.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4740b830

ext4: let ext4_group_add_blocks() return an error code · cc7365df

由 Yongqiang Yang 提交于 7月 26, 2011

This patch lets ext4_group_add_blocks() return an error code if it
fails, so that upper functions can handle error correctly.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

cc7365df

Y
ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks() · 0529155e
由 Yongqiang Yang 提交于 7月 26, 2011
```
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
0529155e

ext4: prevent a fs with errors from being resized · ce723c31

由 Yongqiang Yang 提交于 7月 26, 2011

A filesystem with errors is not allowed to being resized, otherwise,
it is easy to destroy the filesystem.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ce723c31

ext4: prevent parallel resizers by atomic bit ops · 8f82f840

由 Yongqiang Yang 提交于 7月 26, 2011

Before this patch, parallel resizers are allowed and protected by a
mutex lock, actually, there is no need to support parallel resizer, so
this patch prevents parallel resizers by atmoic bit ops, like
lock_page() and unlock_page() do.

To do this, the patch removed the mutex lock s_resize_lock from struct
ext4_sb_info and added a unsigned long field named s_resize_flags
which inidicates if there is a resizer.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8f82f840

26 7月, 2011 5 次提交

ext4: fix data corruption in inodes with journalled data · 2d859db3

由 Jan Kara 提交于 7月 26, 2011

When journalling data for an inode (either because it is a symlink or
because the filesystem is mounted in data=journal mode), ext4_evict_inode()
can discard unwritten data by calling truncate_inode_pages(). This is
because we don't mark the buffer / page dirty when journalling data but only
add the buffer to the running transaction and thus mm does not know there
are still unwritten data.

Fix the problem by carefully tracking transaction containing inode's data,
committing this transaction, and writing uncheckpointed buffers when inode
should be reaped.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2d859db3

fs: take the ACL checks to common code · 4e34e719

由 Christoph Hellwig 提交于 7月 23, 2011

Replace the ->check_acl method with a ->get_acl method that simply reads an
ACL from disk after having a cache miss. This means we can replace the ACL
checking boilerplate code with a single implementation in namei.c.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4e34e719

kill boilerplates around posix_acl_create_masq() · 826cae2f

由 Al Viro 提交于 7月 23, 2011

new helper: posix_acl_create(&acl, gfp, mode_p).  Replaces acl with
modified clone, on failure releases acl and replaces with NULL.
Returns 0 or -ve on error.  All callers of posix_acl_create_masq()
switched.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

826cae2f

kill boilerplate around posix_acl_chmod_masq() · bc26ab5f

由 Al Viro 提交于 7月 23, 2011

new helper: posix_acl_chmod(&acl, gfp, mode).  Replaces acl with modified
clone or with NULL if that has failed; returns 0 or -ve on error.  All
callers of posix_acl_chmod_masq() switched to that - they'd been doing
exactly the same thing.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bc26ab5f

vfs: move ACL cache lookup into generic code · e77819e5

由 Linus Torvalds 提交于 7月 22, 2011

This moves logic for checking the cached ACL values from low-level
filesystems into generic code.  The end result is a streamlined ACL
check that doesn't need to load the inode->i_op->check_acl pointer at
all for the common cached case.

The filesystems also don't need to check for a non-blocking RCU walk
case in their acl_check() functions, because that is all handled at a
VFS layer.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e77819e5

24 7月, 2011 6 次提交

ext4: correct comment for ext4_ext_check_cache · b7ca1e8e

由 Robin Dong 提交于 7月 23, 2011

The comment for ext4_ext_check_cache has a litte mistake.
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b7ca1e8e

ext4: correct the debug message in ext4_ext_insert_extent · 0737964b

由 Robin Dong 提交于 7月 23, 2011

The debug message in ext4_ext_insert_extent before moving extent
is incorrect (the "from xx to xx").
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0737964b

ext4: remove unused argument in ext4_ext_next_leaf_block · 5718789d

由 Robin Dong 提交于 7月 23, 2011

The argument "inode" in function ext4_ext_next_allocated_block looks useless,
so clean it.
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5718789d

ext4: remove ac_repeats from ext4_allocation_context · 6a0fe493

由 Tao Ma 提交于 7月 23, 2011

ac_repeats isn't referenced in the mballoc code. So remove it.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6a0fe493

ext4: don't increment s_mb_buddies_generated in ext4_mb_release · ced156e4

由 Tao Ma 提交于 7月 23, 2011

In ext4_mb_release, we use s_mb_buddies_generated++.  Although
the output is OK, but I don't think we need this extra ++.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ced156e4

ext4: remove unnecessary ext4_get_group_info in ext4_mb_load_buddy · 529da704

由 Tao Ma 提交于 7月 23, 2011

ext4_mb_load_buddy() calls ext4_get_group_info() for setting both
"grp" and "e4b->bd_info", but it could do "e4b->bd_info = grp".
Reported-by: NAndreas Dilger <adilger@whamcloud.com>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

529da704

21 7月, 2011 1 次提交

fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82

由 Josef Bacik 提交于 7月 16, 2011

Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2. For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

02c24a82

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年