提交 · 353eb83c1422c6326eaab30ce044a179c6018169 · openanolis / cloud-kernel

11 1月, 2011 15 次提交

ext4: drop i_state_flags on architectures with 64-bit longs · 353eb83c

由 Theodore Ts'o 提交于 1月 10, 2011

We can store the dynamic inode state flags in the high bits of
EXT4_I(inode)->i_flags, and eliminate i_state_flags.  This saves 8
bytes from the size of ext4_inode_info structure, which when
multiplied by the number of the number of in the inode cache, can save
a lot of memory.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

353eb83c

ext4: reorder ext4_inode_info structure elements to remove unneeded padding · 8a2005d3

由 Theodore Ts'o 提交于 1月 10, 2011

By reordering the elements in the ext4_inode_info structure, we can
reduce the padding needed on an x86_64 system by 16 bytes.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8a2005d3

ext4: drop ec_type from the ext4_ext_cache structure · b05e6ae5

由 Theodore Ts'o 提交于 1月 10, 2011

We can encode the ec_type information by using ee_len == 0 to denote
EXT4_EXT_CACHE_NO, ee_start == 0 to denote EXT4_EXT_CACHE_GAP, and if
neither is true, then the cache type must be EXT4_EXT_CACHE_EXTENT.
This allows us to reduce the size of ext4_ext_inode by another 8
bytes. (ec_type is 4 bytes, plus another 4 bytes of padding)
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b05e6ae5

ext4: use ext4_lblk_t instead of sector_t for logical blocks · 01f49d0b

由 Theodore Ts'o 提交于 1月 10, 2011

This fixes a number of places where we used sector_t instead of
ext4_lblk_t for logical blocks, which for ext4 are still 32-bit data
types.  No point wasting space in the ext4_inode_info structure, and
requiring 64-bit arithmetic on 32-bit systems, when it isn't
necessary.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

01f49d0b

ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED · f2321097

由 Theodore Ts'o 提交于 1月 10, 2011

Remove the short element i_delalloc_reserved_flag from the
ext4_inode_info structure and replace it a new bit in i_state_flags.
Since we have an ext4_inode_info for every ext4 inode cached in the
inode cache, any savings we can produce here is a very good thing from
a memory utilization perspective.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f2321097

ext4: fix 32bit overflow in ext4_ext_find_goal() · ad4fb9ca

由 Kazuya Mio 提交于 1月 10, 2011

ext4_ext_find_goal() returns an ideal physical block number that the block
allocator tries to allocate first. However, if a required file offset is
smaller than the existing extent's one, ext4_ext_find_goal() returns
a wrong block number because it may overflow at
"block - le32_to_cpu(ex->ee_block)". This patch fixes the problem.

ext4_ext_find_goal() will also return a wrong block number in case
a file offset of the existing extent is too big. In this case,
the ideal physical block number is fixed in ext4_mb_initialize_context(),
so it's no problem.

reproduce:
# dd if=/dev/zero of=/mnt/mp1/tmp bs=127M count=1 oflag=sync
# dd if=/dev/zero of=/mnt/mp1/file bs=512K count=1 seek=1 oflag=sync
# filefrag -v /mnt/mp1/file
Filesystem type is: ef53
File size of /mnt/mp1/file is 1048576 (256 blocks, blocksize 4096)
 ext logical physical expected length flags
   0     128    67456             128 eof
/mnt/mp1/file: 2 extents found
# rm -rf /mnt/mp1/tmp
# echo $((512*4096)) > /sys/fs/ext4/loop0/mb_stream_req
# dd if=/dev/zero of=/mnt/mp1/file bs=512K count=1 oflag=sync conv=notrunc

result (linux-2.6.37-rc2 + ext4 patch queue):
# filefrag -v /mnt/mp1/file
Filesystem type is: ef53
File size of /mnt/mp1/file is 1048576 (256 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0    33280             128 
   1     128    67456    33407    128 eof
/mnt/mp1/file: 2 extents found

result(apply this patch):
# filefrag -v /mnt/mp1/file
Filesystem type is: ef53
File size of /mnt/mp1/file is 1048576 (256 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0    66560             128 
   1     128    67456    66687    128 eof
/mnt/mp1/file: 2 extents found
Signed-off-by: NKazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ad4fb9ca

ext4: add more error checks to ext4_mkdir() · dabd991f

由 Namhyung Kim 提交于 1月 10, 2011

Check return value of ext4_journal_get_write_access,
ext4_journal_dirty_metadata and ext4_mark_inode_dirty. Move brelse()
under 'out_stop' to release bh properly in case of journal error.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

dabd991f

ext4: ext4_ext_migrate should use NULL not 0 · f1dffc4c

由 Eric Paris 提交于 1月 10, 2011

ext4_ext_migrate() calls ext4_new_inode() and passes 0 instead of a pointer
to a struct qstr.  This patch uses NULL, to make it obvious to the caller
that this was a pointer.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f1dffc4c

ext4: Use ext4_error_file() to print the pathname to the corrupted inode · f7c21177

由 Theodore Ts'o 提交于 1月 10, 2011

Where the file pointer is available, use ext4_error_file() instead of
ext4_error_inode().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f7c21177

ext4: use IS_ERR() to check for errors in ext4_error_file · f9a62d09

由 Dan Carpenter 提交于 1月 10, 2011

d_path() returns an ERR_PTR and it doesn't return NULL.  This is in
ext4_error_file() and no one actually calls ext4_error_file().
Signed-off-by: NDan Carpenter <error27@gmail.com>

f9a62d09

ext4: test the correct variable in ext4_init_pageio() · 13195184

由 Dan Carpenter 提交于 1月 10, 2011

This is a copy and paste error.  The intent was to check
"io_page_cachep".  We tested "io_page_cachep" earlier.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

13195184

W
ext2,ext3,ext4: clarify comment for extN_xattr_set_handle · 6e9510b0
由 Wang Sheng-Hui 提交于 1月 10, 2011
```
Signed-off-by: NWang Sheng-Hui <crosslonelyover@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
6e9510b0

ext4: clean up ext4_xattr_list()'s error code checking and return strategy · eaeef867

由 Theodore Ts'o 提交于 1月 10, 2011

Any time you see code that tries to add error codes together, you
should want to claw your eyes out...
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

eaeef867

ext4: remove warning message from ext4_issue_discard helper · 93259636

由 Lukas Czerner 提交于 1月 10, 2011

ext4_issue_discard is supposed to be helper for calling discard, however
in case that underlying device does not support discard it prints out
the warning message and clears the DISCARD t_mount_opt flag. Since it
can be (and is) used by others, it should not do anything and let the
caller to handle the error case.

This commit removes warning message and flag setting from
ext4_issue_discard and use it just in place where it is really needed
(release_blocks_on_commit). FITRIM ioctl should not set any flags nor it
should print out warning messages, so get rid of the warning as well.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>

93259636

ext4: fix possible overflow in ext4_trim_fs() · 4f531501

由 Lukas Czerner 提交于 1月 10, 2011

When determining last group through ext4_get_group_no_and_offset() the
result may be wrong in cases when range->start and range-len are too
big, because it may overflow when summing up those two numbers.

Fix that by checking range->len and limit its value to
ext4_blocks_count(). This commit was tested by myself with expected
result.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>

4f531501

20 12月, 2010 7 次提交

T
ext4: Add error checking to kmem_cache_alloc() call in ext4_free_blocks() · b72143ab
由 Theodore Ts'o 提交于 12月 20, 2010
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
b72143ab

ext4: Use printf extension %pV · 0ff2ea7d

由 Joe Perches 提交于 12月 19, 2010

Using %pV reduces the number of printk calls and eliminates any
possible message interleaving from other printk calls.

In function __ext4_grp_locked_error also added KERN_CONT to some
printks.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0ff2ea7d

J
ext4: Use vzalloc in ext4_fill_flex_info() · 94de56ab
由 Joe Perches 提交于 12月 19, 2010
```
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
94de56ab

ext4: zero out nanosecond timestamps for small inodes · af0b44a1

由 Eric Sandeen 提交于 12月 19, 2010

When nanosecond timestamp resolution isn't supported on an ext4
partition (inode size = 128), stat() appears to be returning
uninitialized garbage in the nanosecond component of timestamps.

EXT4_INODE_GET_XTIME should zero out tv_nsec when EXT4_FITS_IN_INODE
evaluates to false.
Reported-by: NJordan Russell <jr-list-2010@quo.to>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

af0b44a1

ext4: optimize ext4_check_dir_entry() with unlikely() annotations · cad3f007

由 Theodore Ts'o 提交于 12月 19, 2010

This function gets called a lot for large directories, and the answer
is almost always "no, no, there's no problem".  This means using
unlikely() is a good thing.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

cad3f007

ext4: use kmem_cache_zalloc() in ext4_init_io_end() · b17b35ec

由 Jesper Juhl 提交于 12月 19, 2010

Use advantage of kmem_cache_zalloc() to remove a memset() call in
ext4_init_io_end() and save a few bytes.

Before:
 [jj@dragon linux-2.6]$ size fs/ext4/page-io.o
    text    data     bss     dec     hex filename
    3016       0     624    3640     e38 fs/ext4/page-io.o
After:
 [jj@dragon linux-2.6]$ size fs/ext4/page-io.o
    text    data     bss     dec     hex filename
    3000       0     624    3624     e28 fs/ext4/page-io.o
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b17b35ec

ext4: Remove redundant unlikely() · 6ca7b13d

由 Tobias Klauser 提交于 12月 19, 2010

IS_ERR() already implies unlikely(), so it can be omitted here.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6ca7b13d

17 12月, 2010 2 次提交

ext4: Use pr_warning_ratelimited() instead of printk_ratelimit() · a8901d34

由 Theodore Ts'o 提交于 12月 17, 2010

printk_ratelimit() is deprecated since it is a global instead of a
per-printk ratelimit.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a8901d34

ext4: Fix up comments in inode.c · 225db7d3

由 Theodore Ts'o 提交于 12月 16, 2010

This fixes up some broken argument descriptions that Namhyung Kim had
originally submitted for ext3.  This fixes the comments that were
still applicable in ext4.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

225db7d3

16 12月, 2010 3 次提交

T
ext4: Add second mount options field since the s_mount_opt is full up · a2595b8a
由 Theodore Ts'o 提交于 12月 15, 2010
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
a2595b8a

ext4: Move struct ext4_mount_options from ext4.h to super.c · 673c6100

由 Theodore Ts'o 提交于 12月 15, 2010

Move the ext4_mount_options structure definition from ext4.h, since it
is only used in super.c.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

673c6100

ext4: Simplify the usage of clear_opt() and set_opt() macros · fd8c37ec

由 Theodore Ts'o 提交于 12月 15, 2010

Change clear_opt() and set_opt() to take a superblock pointer instead
of a pointer to EXT4_SB(sb)->s_mount_opt.  This makes it easier for us
to support a second mount option field.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fd8c37ec

15 12月, 2010 2 次提交

ext4: fix typo which broke '..' detection in ext4_find_entry() · 6d5c3aa8

由 Aaro Koskinen 提交于 12月 14, 2010

There should be a check for the NUL character instead of '0'.

Fortunately the only thing that cares about this is NFS serving, which
is why we didn't notice this in the merge window testing.
Reported-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: NAaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6d5c3aa8

ext4: Turn off multiple page-io submission by default · 1449032b

由 Theodore Ts'o 提交于 12月 14, 2010

Jon Nelson has found a test case which causes postgresql to fail with
the error:

psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581

Under memory pressure, it looks like part of a file can end up getting
replaced by zero's.  Until we can figure out the cause, we'll roll
back the change and use block_write_full_page() instead of
ext4_bio_write_page().  The new, more efficient writing function can
be used via the mount option mblk_io_submit, so we can test and fix
the new page I/O code.

To reproduce the problem, install postgres 8.4 or 9.0, and pin enough
memory such that the system just at the end of triggering writeback
before running the following sql script:

begin;
create temporary table foo as select x as a, ARRAY[x] as b FROM
generate_series(1, 10000000 ) AS x;
create index foo_a_idx on foo (a);
create index foo_b_idx on foo USING GIN (b);
rollback;

If the temporary table is created on a hard drive partition which is
encrypted using dm_crypt, then under memory pressure, approximately
30-40% of the time, pgsql will issue the above failure.

This patch should fix this problem, and the problem will come back if
the file system is mounted with the mblk_io_submit mount option.
Reported-by: NJon Nelson <jnelson@jamponi.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1449032b

20 11月, 2010 2 次提交

ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard · e681c047

由 Lukas Czerner 提交于 11月 19, 2010

Filesystem independent ioctl was rejected as not common enough to be in
core vfs ioctl. Since we still need to access to this functionality this
commit adds ext4 specific ioctl EXT4_IOC_TRIM to dispatch
ext4_trim_fs().

It takes fstrim_range structure as an argument. fstrim_range is definec in
the include/linux/fs.h and its definition is as follows.

struct fstrim_range {
	__u64 start;
	__u64 len;
	__u64 minlen;
}

start	- first Byte to trim
len	- number of Bytes to trim from start
minlen	- minimum extent length to trim, free extents shorter than this
  number of Bytes will be ignored. This will be rounded up to fs
  block size.

After the FITRIM is done, the number of actually discarded Bytes is stored
in fstrim_range.len to give the user better insight on how much storage
space has been really released for wear-leveling.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e681c047

fs: Do not dispatch FITRIM through separate super_operation · 93bb41f4

由 Lukas Czerner 提交于 11月 19, 2010

There was concern that FITRIM ioctl is not common enough to be included
in core vfs ioctl, as Christoph Hellwig pointed out there's no real point
in dispatching this out to a separate vector instead of just through
->ioctl.

So this commit removes ioctl_fstrim() from vfs ioctl and trim_fs
from super_operation structure.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

93bb41f4

19 11月, 2010 1 次提交

ext4: ext4_fill_super shouldn't return 0 on corruption · 5a9ae68a

由 Darrick J. Wong 提交于 11月 19, 2010

At the start of ext4_fill_super, ret is set to -EINVAL, and any failure path
out of that function returns ret.  However, the generic_check_addressable
clause sets ret = 0 (if it passes), which means that a subsequent failure (e.g.
a group checksum error) returns 0 even though the mount should fail.  This
causes vfs_kern_mount in turn to think that the mount succeeded, leading to an
oops.

A simple fix is to avoid using ret for the generic_check_addressable check,
which was last changed in commit 30ca22c7.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5a9ae68a

18 11月, 2010 2 次提交

ext4: missing unlock in ext4_clear_request_list() · f4c8cc65

由 Dan Carpenter 提交于 11月 17, 2010

If the the li_request_list was empty then it returned with the lock
held.  Instead of adding a "goto unlock" I just removed that special
case and let it go past the empty list_for_each_safe().
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f4c8cc65

ext4: fix setting random pages PageUptodate · 08da1193

由 Markus Trippelsdorf 提交于 11月 17, 2010

ext4_end_bio calls put_page and kmem_cache_free before calling
SetPageUpdate(). This can result in setting the PageUptodate bit on
random pages and causes the following BUG:

 BUG: Bad page state in process rm  pfn:52e54
 page:ffffea0001222260 count:0 mapcount:0 mapping:          (null) index:0x0
 arch kernel: page flags: 0x4000000000000008(uptodate)

Fix the problem by moving put_io_page() after the SetPageUpdate() call.

Thanks to Hugh Dickins for analyzing this problem.
Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

08da1193

09 11月, 2010 5 次提交

ext4: Add new ext4 inode tracepoints · 7ff9c073

由 Theodore Ts'o 提交于 11月 08, 2010

Add ext4_evict_inode, ext4_drop_inode, ext4_mark_inode_dirty, and
ext4_begin_ordered_truncate()
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7ff9c073

ext4: Don't call sb_issue_discard() in ext4_free_blocks() · b56ff9d3

由 Theodore Ts'o 提交于 11月 08, 2010

Commit 5c521830 (ext4: Support discard requests when running in
no-journal mode) attempts to add sb_issue_discard() for data blocks
(in data=writeback mode) and in no-journal mode.  Unfortunately, this
no longer works, because in commit dd3932ed (block: remove
BLKDEV_IFL_WAIT), sb_issue_discard() only presents a synchronous
interface, and there are times when we call ext4_free_blocks() when we
are are holding a spinlock, or are otherwise in an atomic context.

For now, I've removed the call to sb_issue_discard() to prevent a
deadlock or (if spinlock debugging is enabled) failures like this:

BUG: scheduling while atomic: rc.sysinit/1376/0x00000002
Pid: 1376, comm: rc.sysinit Not tainted 2.6.36-ARCH #1
Call Trace:
[<ffffffff810397ce>] __schedule_bug+0x5e/0x70
[<ffffffff81403110>] schedule+0x950/0xa70
[<ffffffff81060bad>] ? insert_work+0x7d/0x90
[<ffffffff81060fbd>] ? queue_work_on+0x1d/0x30
[<ffffffff81061127>] ? queue_work+0x37/0x60
[<ffffffff8140377d>] schedule_timeout+0x21d/0x360
[<ffffffff812031c3>] ? generic_make_request+0x2c3/0x540
[<ffffffff81402680>] wait_for_common+0xc0/0x150
[<ffffffff81041490>] ? default_wake_function+0x0/0x10
[<ffffffff812034bc>] ? submit_bio+0x7c/0x100
[<ffffffff810680a0>] ? wake_bit_function+0x0/0x40
[<ffffffff814027b8>] wait_for_completion+0x18/0x20
[<ffffffff8120a969>] blkdev_issue_discard+0x1b9/0x210
[<ffffffff811ba03e>] ext4_free_blocks+0x68e/0xb60
[<ffffffff811b1650>] ? __ext4_handle_dirty_metadata+0x110/0x120
[<ffffffff811b098c>] ext4_ext_truncate+0x8cc/0xa70
[<ffffffff810d713e>] ? pagevec_lookup+0x1e/0x30
[<ffffffff81191618>] ext4_truncate+0x178/0x5d0
[<ffffffff810eacbb>] ? unmap_mapping_range+0xab/0x280
[<ffffffff810d8976>] vmtruncate+0x56/0x70
[<ffffffff811925cb>] ext4_setattr+0x14b/0x460
[<ffffffff811319e4>] notify_change+0x194/0x380
[<ffffffff81117f80>] do_truncate+0x60/0x90
[<ffffffff811e08fa>] ? security_inode_permission+0x1a/0x20
[<ffffffff811eaec1>] ? tomoyo_path_truncate+0x11/0x20
[<ffffffff81127539>] do_last+0x5d9/0x770
[<ffffffff811278bd>] do_filp_open+0x1ed/0x680
[<ffffffff8140644f>] ? page_fault+0x1f/0x30
[<ffffffff81132bfc>] ? alloc_fd+0xec/0x140
[<ffffffff81118db1>] do_sys_open+0x61/0x120
[<ffffffff81118e8b>] sys_open+0x1b/0x20
[<ffffffff81002e6b>] system_call_fastpath+0x16/0x1b

https://bugzilla.kernel.org/show_bug.cgi?id=22302Reported-by: NMathias Burén <mathias.buren@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: jiayingz@google.com

b56ff9d3

ext4: do not try to grab the s_umount semaphore in ext4_quota_off · 87009d86

由 Dmitry Monakhov 提交于 11月 08, 2010

It's not needed to sync the filesystem, and it fixes a lock_dep complaint.
Signed-off-by: NDmitry Monakhov <dmonakhov@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

87009d86

ext4: fix potential race when freeing ext4_io_page structures · 83668e71

由 Theodore Ts'o 提交于 11月 08, 2010

Use an atomic_t and make sure we don't free the structure while we
might still be submitting I/O for that page.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

83668e71

ext4: handle writeback of inodes which are being freed · f7ad6d2e

由 Theodore Ts'o 提交于 11月 08, 2010

The following BUG can occur when an inode which is getting freed when
it still has dirty pages outstanding, and it gets deleted (in this
because it was the target of a rename).  In ordered mode, we need to
make sure the data pages are written just in case we crash before the
rename (or unlink) is committed.  If the inode is being freed then
when we try to igrab the inode, we end up tripping the BUG_ON at
fs/ext4/page-io.c:146.

To solve this problem, we need to keep track of the number of io
callbacks which are pending, and avoid destroying the inode until they
have all been completed.  That way we don't have to bump the inode
count to keep the inode from being destroyed; an approach which
doesn't work because the count could have already been dropped down to
zero before the inode writeback has started (at which point we're not
allowed to bump the count back up to 1, since it's already started
getting freed).

Thanks to Dave Chinner for suggesting this approach, which is also
used by XFS.

  kernel BUG at /scratch_space/linux-2.6/fs/ext4/page-io.c:146!
  Call Trace:
   [<ffffffff811075b1>] ext4_bio_write_page+0x172/0x307
   [<ffffffff811033a7>] mpage_da_submit_io+0x2f9/0x37b
   [<ffffffff811068d7>] mpage_da_map_and_submit+0x2cc/0x2e2
   [<ffffffff811069b3>] mpage_add_bh_to_extent+0xc6/0xd5
   [<ffffffff81106c66>] write_cache_pages_da+0x2a4/0x3ac
   [<ffffffff81107044>] ext4_da_writepages+0x2d6/0x44d
   [<ffffffff81087910>] do_writepages+0x1c/0x25
   [<ffffffff810810a4>] __filemap_fdatawrite_range+0x4b/0x4d
   [<ffffffff810815f5>] filemap_fdatawrite_range+0xe/0x10
   [<ffffffff81122a2e>] jbd2_journal_begin_ordered_truncate+0x7b/0xa2
   [<ffffffff8110615d>] ext4_evict_inode+0x57/0x24c
   [<ffffffff810c14a3>] evict+0x22/0x92
   [<ffffffff810c1a3d>] iput+0x212/0x249
   [<ffffffff810bdf16>] dentry_iput+0xa1/0xb9
   [<ffffffff810bdf6b>] d_kill+0x3d/0x5d
   [<ffffffff810be613>] dput+0x13a/0x147
   [<ffffffff810b990d>] sys_renameat+0x1b5/0x258
   [<ffffffff81145f71>] ? _atomic_dec_and_lock+0x2d/0x4c
   [<ffffffff810b2950>] ? cp_new_stat+0xde/0xea
   [<ffffffff810b29c1>] ? sys_newlstat+0x2d/0x38
   [<ffffffff810b99c6>] sys_rename+0x16/0x18
   [<ffffffff81002a2b>] system_call_fastpath+0x16/0x1b
Reported-by: NNick Bowler <nbowler@elliptictech.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Tested-by: NNick Bowler <nbowler@elliptictech.com>

f7ad6d2e

04 11月, 2010 1 次提交

ext4: initialize the percpu counters before replaying the journal · ce7e010a

由 Theodore Ts'o 提交于 11月 03, 2010

We now initialize the percpu counters before replaying the journal,
but after the journal, we recalculate the global counters, to deal
with the possibility of the per-blockgroup counts getting updated by
the journal replay.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ce7e010a

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功