提交 · 3cf430b0636045dc524759a0852293ba037732a7 · openeuler / raspberrypi-kernel

03 8月, 2008 1 次提交

ext4: remove write-only variables from ext4_ordered_write_end · 7d55992d

由 Eric Sandeen 提交于 8月 02, 2008

The variables 'from' and 'to' are not used anywhere.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Acked-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7d55992d

01 8月, 2008 1 次提交

[PATCH] fix races and leaks in vfs_quota_on() users · 77e69dac

由 Al Viro 提交于 8月 01, 2008

* new helper: vfs_quota_on_path(); equivalent of vfs_quota_on() sans the
  pathname resolution.
* callers of vfs_quota_on() that do their own pathname resolution and
  checks based on it are switched to vfs_quota_on_path(); that way we
  avoid the races.
* reiserfs leaked dentry/vfsmount references on several failure exits.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

77e69dac

29 7月, 2008 1 次提交

vfs: pagecache usage optimization for pagesize!=blocksize · 8ab22b9a

由 Hisashi Hifumi 提交于 7月 28, 2008

When we read some part of a file through pagecache, if there is a
pagecache of corresponding index but this page is not uptodate, read IO
is issued and this page will be uptodate.

I think this is good for pagesize == blocksize environment but there is
room for improvement on pagesize != blocksize environment.  Because in
this case a page can have multiple buffers and even if a page is not
uptodate, some buffers can be uptodate.

So I suggest that when all buffers which correspond to a part of a file
that we want to read are uptodate, use this pagecache and copy data from
this pagecache to user buffer even if a page is not uptodate.  This can
reduce read IO and improve system throughput.

I wrote a benchmark program and got result number with this program.

This benchmark do:

  1: mount and open a test file.

  2: create a 512MB file.

  3: close a file and umount.

  4: mount and again open a test file.

  5: pwrite randomly 300000 times on a test file.  offset is aligned
     by IO size(1024bytes).

  6: measure time of preading randomly 100000 times on a test file.

The result was:
	2.6.26
        330 sec

	2.6.26-patched
        226 sec

Arch:i386
Filesystem:ext3
Blocksize:1024 bytes
Memory: 1GB

On ext3/4, a file is written through buffer/block.  So random read/write
mixed workloads or random read after random write workloads are optimized
with this patch under pagesize != blocksize environment.  This test result
showed this.

The benchmark program is as follows:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mount.h>

#define LEN 1024
#define LOOP 1024*512 /* 512MB */

main(void)
{
	unsigned long i, offset, filesize;
	int fd;
	char buf[LEN];
	time_t t1, t2;

	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
		perror("cannot mount\n");
		exit(1);
	}
	memset(buf, 0, LEN);
	fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
	if (fd < 0) {
		perror("cannot open file\n");
		exit(1);
	}
	for (i = 0; i < LOOP; i++)
		write(fd, buf, LEN);
	close(fd);
	if (umount("/root/test1/") < 0) {
		perror("cannot umount\n");
		exit(1);
	}
	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
		perror("cannot mount\n");
		exit(1);
	}
	fd = open("/root/test1/testfile", O_RDWR);
	if (fd < 0) {
		perror("cannot open file\n");
		exit(1);
	}

	filesize = LEN * LOOP;
	for (i = 0; i < 300000; i++){
		offset = (random() % filesize) & (~(LEN - 1));
		pwrite(fd, buf, LEN, offset);
	}
	printf("start test\n");
	time(&t1);
	for (i = 0; i < 100000; i++){
		offset = (random() % filesize) & (~(LEN - 1));
		pread(fd, buf, LEN, offset);
	}
	time(&t2);
	printf("%ld sec\n", t2-t1);
	close(fd);
	if (umount("/root/test1/") < 0) {
		perror("cannot umount\n");
		exit(1);
	}
}
Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@ucw.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ab22b9a

19 8月, 2008 1 次提交

ext4: Fix small file fragmentation · 5e745b04

由 Aneesh Kumar K.V 提交于 8月 18, 2008

For small file block allocations, mballoc uses per cpu prealloc
space.  Use goal block when searching for the right prealloc
space.  Also make sure ext4_da_writepages tries to write
all the pages for small files in single attempt
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5e745b04

20 8月, 2008 9 次提交

ext4: Initialize writeback_index to 0 when allocating a new inode · 91246c00

由 Aneesh Kumar K.V 提交于 8月 19, 2008

The write_cache_pages() function uses the mapping->writeback_index as
the starting index to write out when range_cyclic is set. Properly
initialize writeback_index so that we start the writeout at index 0.

This was found when debugging the small file fragmentation on ext4.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

91246c00

ext4: make sure ext4_has_free_blocks returns 0 for ENOSPC · 16eb7295

由 Aneesh Kumar K.V 提交于 8月 19, 2008

Fix ext4_has_free_blocks() to return 0 when we don't have enough space.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

16eb7295

ext4: journal credit fix for the delayed allocation's writepages() function · 525f4ed8

由 Mingming Cao 提交于 8月 19, 2008

Previous delalloc writepages implementation started a new transaction
outside of a loop which called get_block() to do the block allocation.
Since we didn't know exactly how many blocks would need to be allocated,
the estimated journal credits required was very conservative and caused
many issues.

With the reworked delayed allocation, a new transaction is created for
each get_block(), thus we don't need to guess how many credits for the
multiple chunk of allocation.  We start every transaction with enough
credits for inserting a single exent.  When estimate the credits for
indirect blocks to allocate a chunk of blocks, we need to know the
number of data blocks to allocate.  We use the total number of reserved
delalloc datablocks; if that is too big, for non-extent files, we need
to limit the number of blocks to EXT4_MAX_TRANS_BLOCKS.

Code cleanup from Aneesh.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Reviewed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

525f4ed8

ext4: Rework the ext4_da_writepages() function · a1d6cc56

由 Aneesh Kumar K.V 提交于 8月 19, 2008

With the below changes we reserve credit needed to insert only one
extent resulting from a call to single get_block. This makes sure we
don't take too much journal credits during writeout. We also don't
limit the pages to write. That means we loop through the dirty pages
building largest possible contiguous block request. Then we issue a
single get_block request. We may get less block that we requested. If
so we would end up not mapping some of the buffer_heads. That means
those buffer_heads are still marked delay. Later in the writepage
callback via __mpage_writepage we redirty those pages.

We should also not limit/throttle wbc->nr_to_write in the filesystem
writepages callback. That cause wrong behaviour in
generic_sync_sb_inodes caused by wbc->nr_to_write being <= 0
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a1d6cc56

ext4: journal credits reservation fixes for DIO, fallocate · f3bd1f3f

由 Mingming Cao 提交于 8月 19, 2008

DIO and fallocate credit calculation is different than writepage, as
they do start a new journal right for each call to ext4_get_blocks_wrap().
This patch uses the helper function in DIO and fallocate case, passing
a flag indicating that the modified data are contigous thus could account
less indirect/index blocks.

This patch also fixed the journal credit reservation for direct I/O
(DIO).  Previously the estimated credits for DIO only was calculated for
non-extent files, which was not enough if the file is extent-based.

Also fixed was fallocate double-counting credits for modifying the the
superblock.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f3bd1f3f

ext4: journal credits reservation fixes for extent file writepage · ee12b630

由 Mingming Cao 提交于 8月 19, 2008

This patch modified the writepage/write_begin credit calculation for
extent files, to use the credits caculation helper function.

The current calculation of how many index/leaf blocks should be
accounted is too conservetive, it always considered the worse case,
where the tree level is 5, and in the case of multiple chunk
allocations, it always assumed no blocks were dirtied in common across
the allocations. This path uses the accurate depth of the inode with
some extras to calculate the index blocks, and also less conservative in
the case of multiple allocation accounting.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ee12b630

ext4: journal credits calulation cleanup and fix for non-extent writepage · a02908f1

由 Mingming Cao 提交于 8月 19, 2008

When considering how many journal credits are needed for modifying a
chunk of data, we need to account for the super block, inode block,
quota blocks and xattr block, indirect/index blocks, also, group bitmap
and group descriptor blocks for new allocation (including data and
indirect/index blocks). There are many places in ext4 do the calculation
on their own and often missed one or two meta blocks, and often they
assume single block allocation, and did not considering the multile
chunk of allocation case.

This patch is trying to cleanup current journal credit code, provides
some common helper funtion to calculate the journal credits, to be used
for writepage, writepages, DIO, fallocate, migration, defrag, and for
both nonextent and extent files.

This patch modified the writepage/write_begin credit caculation for
nonextent files, to use the new helper function. It also fixed the
problem that writepage on nonextent files did not consider the case
blocksize <pagesize, thus could possibelly need multiple block
allocation in a single transaction.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a02908f1

ext4: Fix bug where we return ENOSPC even though we have plenty of inodes · c001077f

由 Eric Sandeen 提交于 8月 19, 2008

The find_group_flex() function starts with best_flex as the
parent_fbg_group, which happens to have 0 inodes free.  Some of the
flex groups searched have free blocks and free inodes, but the
flex_freeb_ratio is < 10, so they're skipped.  Then when a group is
compared to the current "best" flex group, it does not have more free
blocks than "best", so it is skipped as well.

This continues until no flex group with free inodes is found which has
a proper ratio or which has more free blocks than the "best" group,
and we're left with a "best" group that has 0 inodes free, and we
return -ENOSPC.

We fix this by changing the logic so that if the current "best" flex
group has no inodes free, and the current one does have room, it is
promoted to the next "best."
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c001077f

ext4: don't try to resize if there are no reserved gdt blocks left · 37609fd5

由 Josef Bacik 提交于 8月 19, 2008

When trying to resize an ext4 fs and you run out of reserved gdt blocks,
you get an error that doesn't actually tell you what went wrong, it just
says that the gdb it picked is not correct, which is the case since you
don't have any reserved gdt blocks left.  This patch adds a check to make
sure you have reserved gdt blocks to use, and if not prints out a more
relevant error.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Andreas Dilger <adilger@sun.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

37609fd5

16 8月, 2008 1 次提交

ext4: Use ext4_discard_reservations instead of mballoc-specific call · 88aa3cff

由 Theodore Ts'o 提交于 8月 16, 2008

In ext4_ext_truncate(), we should use the more generic
ext4_discard_reservations() call so we do the right thing when the
filesystem is mounted with the nomballoc option.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NMingming Cao <cmm@us.ibm.com>

88aa3cff

20 8月, 2008 2 次提交

ext4: Fix ext4_dx_readdir hash collision handling · d0156417

由 Theodore Ts'o 提交于 8月 19, 2008

This fixes a bug where readdir() would return a directory entry twice
if there was a hash collision in an hash tree indexed directory.
Signed-off-by: NEugene Dashevsky <eugene@ibrix.com>
Signed-off-by: NMike Snitzer <msnitzer@ibrix.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d0156417

ext4: Fix delalloc release block reservation for truncate · cd213226

由 Mingming Cao 提交于 8月 19, 2008

Ext4 will release the reserved blocks for delayed allocations when
inode is truncated/unlinked.  If there is no reserved block at all, we
shouldn't need to do so.  But current code still tries to release the
reserved blocks regardless whether the counters's value is 0.
Continue to do that causes the later calculation to go wrong and a
kernel BUG_ON() caught that. This doesn't happen for extent-based
files, as the calculation for 0 reserved blocks was right for extent
based file.

This patch fixed the kernel BUG() due to above reason.  It adds checks
for 0 to avoid unnecessary release and fix calculation for non-extent
files.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

cd213226

14 8月, 2008 1 次提交

ext4: Fix potential truncate BUG due to i_prealloc_list being non-empty · b4df2030

由 Theodore Ts'o 提交于 8月 13, 2008

We need to call ext4_discard_reservation() earlier in ext4_truncate(),
to avoid a BUG() in ext4_mb_return_to_preallocation(), which is called
(ultimately) by ext4_free_blocks().  So we must ditch the blocks on
i_prealloc_list before we start freeing the data blocks.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b4df2030

20 8月, 2008 1 次提交

ext4: Handle unwritten extent properly with delayed allocation · bf068ee2

由 Aneesh Kumar K.V 提交于 8月 19, 2008

When using fallocate the buffer_heads are marked unwritten and unmapped.
We need to map them in the writepages after a get_block.  Otherwise we
split the uninit extents, but never write the content to disk.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bf068ee2

27 7月, 2008 3 次提交

[PATCH] sanitize ->permission() prototype · e6305c43

由 Al Viro 提交于 7月 15, 2008

* kill nameidata * argument; map the 3 bits in ->flags anybody cares
  about to new MAY_... ones and pass with the mask.
* kill redundant gfs2_iop_permission()
* sanitize ecryptfs_permission()
* fix remaining places where ->permission() instances might barf on new
  MAY_... found in mask.

The obvious next target in that direction is permission(9)

folded fix for nfs_permission() breakage from Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e6305c43

T
ext4: Cleanup whitespace and other miscellaneous style issues · 2b2d6d01
由 Theodore Ts'o 提交于 7月 26, 2008
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
2b2d6d01

SL*B: drop kmem cache argument from constructor · 51cc5068

由 Alexey Dobriyan 提交于 7月 25, 2008

Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres.  Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
	arch/powerpc/mm/init_64.c
	arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

51cc5068

25 7月, 2008 1 次提交

ext4: improve ext4_fill_flex_info() a bit · ec05e868

由 Li Zefan 提交于 7月 24, 2008

- use kzalloc() instead of kmalloc() + memset()
- improve a printk info
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ec05e868

18 7月, 2008 1 次提交

ext4: Cleanup the block reservation code path · 12219aea

由 Aneesh Kumar K.V 提交于 7月 17, 2008

The truncate patch should not use the i_allocated_meta_blocks
value. So add seperate functions to be used in the truncate
and alloc path. We also need to release the meta-data block
that we reserved for the blocks that we are truncating.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

12219aea

02 8月, 2008 1 次提交

ext4: don't assume extents can't cross block groups when truncating · 34071da7

由 Theodore Ts'o 提交于 8月 01, 2008

With the FLEX_BG layout, there is no reason why extents can't cross
block groups, so make the truncate code reserve enough credits so we
don't BUG if we come across such an extent.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

34071da7

03 8月, 2008 1 次提交

ext4: Fix lack of credits BUG() when deleting a badly fragmented inode · bc965ab3

由 Theodore Ts'o 提交于 8月 02, 2008

The extents codepath for ext4_truncate() requests journal transaction
credits in very small chunks, requesting only what is needed.  This
means there may not be enough credits left on the transaction handle
after ext4_truncate() returns and then when ext4_delete_inode() tries
finish up its work, it may not have enough transaction credits,
causing a BUG() oops in the jbd2 core.

Also, reserve an extra 2 blocks when starting an ext4_delete_inode()
since we need to update the inode bitmap, as well as update the
orphaned inode linked list.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bc965ab3

02 8月, 2008 1 次提交

ext4: Fix ext4_ext_journal_restart() · 0123c939

由 Theodore Ts'o 提交于 8月 01, 2008

The ext4_ext_journal_restart() is a convenience function which checks
to see if the requested number of credits is present, and if so it
closes the current transaction and attaches the current handle to the
new transaction.  Unfortunately, it wasn't proprely checking the
return value from ext4_journal_extend(), so it was starting a new
transaction when one was not necessary, and returning an error when
all that was necessary was to restart the handle with a new
transaction.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0123c939

03 8月, 2008 1 次提交

ext4: fix ext4_da_write_begin error path · d5a0d4f7

由 Eric Sandeen 提交于 8月 02, 2008

ext4_da_write_begin needs to call journal_stop before returning,
if the page allocation fails.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Acked-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d5a0d4f7

27 7月, 2008 1 次提交

ext4: don't read inode block if the buffer has a write error · 9c83a923

由 Hidehiro Kawai 提交于 7月 26, 2008

A transient I/O error can corrupt inode data.  Here is the scenario:

(1) update inode_A at the block_B
(2) pdflush writes out new inode_A to the filesystem, but it results
    in write I/O error, at this point, BH_Uptodate flag of the buffer
    for block_B is cleared and BH_Write_EIO is set
(3) create new inode_C which located at block_B, and
    __ext4_get_inode_loc() tries to read on-disk block_B because the
    buffer is not uptodate
(4) if it can read on-disk block_B successfully, inode_A is
    overwritten by old data

This patch makes __ext4_get_inode_loc() not read the inode block if the
buffer has BH_Write_EIO flag.  In this case, the buffer should have the
latest information, so setting the uptodate flag to the buffer (this
avoids WARN_ON_ONCE() in mark_buffer_dirty().)

According to this change, we would need to test BH_Write_EIO flag for the
error checking.  Currently nobody checks write I/O errors on metadata
buffers, but it will be done in other patches I'm working on.
Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: sugita <yumiko.sugita.yf@hitachi.com>
Cc: Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jan Kara <jack@ucw.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

9c83a923

24 7月, 2008 3 次提交

ext4: Don't allow lg prealloc list to be grow large. · 6be2ded1

由 Aneesh Kumar K.V 提交于 7月 23, 2008

Currently, the locality group prealloc list is freed only when there
is a block allocation failure. This can result in large number of
entries in the preallocation list making ext4_mb_use_preallocated()
expensive.

To fix this, we convert the locality group prealloc list to a hash
list. The hash index is the order of number of blocks in the prealloc
space with a max order of 9. When adding prealloc space to the list we
make sure total entries for each order does not exceed 8. If it is
more than 8 we discard few entries and make sure the we have only <= 5
entries.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

6be2ded1

ext4: Convert the usage of NR_CPUS to nr_cpu_ids. · 1320cbcf

由 Aneesh Kumar K.V 提交于 7月 23, 2008

NR_CPUS can be really large. We should be using nr_cpu_ids instead.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

1320cbcf

ext4: Improve error handling in mballoc · ce89f46c

由 Aneesh Kumar K.V 提交于 7月 23, 2008

Don't call BUG_ON on file system failures. Instead use ext4_error and
also handle the continue case properly.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ce89f46c

03 8月, 2008 2 次提交

ext4: lock block groups when initializing · b5f10eed

由 Eric Sandeen 提交于 8月 02, 2008

I noticed when filling a 1T filesystem with 4 threads using the
fs_mark benchmark:

fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0

that I occasionally got checksum mismatch errors:

EXT4-fs error (device sdb): ext4_init_inode_bitmap: Checksum bad for group 6935

etc.  I'd reliably get 4-5 of them during the run.

It appears that the problem is likely a race to init the bg's
when the uninit_bg feature is enabled.

With the patch below, which adds sb_bgl_locking around initialization,
I was able to complete several runs with no errors or warnings.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b5f10eed

ext4: sync up block and inode bitmap reading functions · e29d1cde

由 Eric Sandeen 提交于 8月 02, 2008

ext4_read_block_bitmap and read_inode_bitmap do essentially
the same thing, and yet they are structured quite differently.
I came across this difference while looking at doing bg locking
during bg initialization.

This patch:

* removes unnecessary casts in the error messages
* renames read_inode_bitmap to ext4_read_inode_bitmap
* and more substantially, restructures the inode bitmap
  reading function to be more like the block bitmap counterpart.

The change to the inode bitmap reader simplifies the locking
to be applied in the next patch.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

e29d1cde

27 7月, 2008 1 次提交

ext4: Allow read/only mounts with corrupted block group checksums · 8a266467

由 Theodore Ts'o 提交于 7月 26, 2008

If the block group checksums are corrupted, still allow the mount to
succeed, so e2fsck can have a chance to try to fix things up. Add
code in the remount r/w path to make sure the block group checksums
are valid before allowing the filesystem to be remounted read/write.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8a266467

03 8月, 2008 1 次提交

ext4: Fix data corruption when writing to prealloc area · d03856bd

由 Aneesh Kumar K.V 提交于 8月 02, 2008

Inserting an extent can cause a new entry in the already existing index
block. That doesn't increase the depth of the instead. Instead it adds a
new leaf block. Now with the new leaf block the path information
corresponding to the logical block should be fetched from the new block.
The old path will be pointing to the old leaf block.

We need to recalucate the path information on extent insert
even if depth doesn't change. Without this change, the extent merge
after converting an unwritten extent to initialized extent takes the wrong
extent and cause data corruption.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d03856bd

12 7月, 2008 5 次提交

ext4: do not set extents feature from the kernel · e4079a11

由 Eric Sandeen 提交于 7月 11, 2008

We've talked for a while about getting rid of any feature-
setting from the kernel; this gets rid of the code which would
set the INCOMPAT_EXTENTS flag on the first file write when mounted
as ext4[dev].

With this patch, if the extents feature is not already set on disk,
then mounting as ext4 will fall back to noextents with a warning,
and if -o extents is explicitly requested, the mount will fail,
also with warning.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e4079a11

ext4: Don't allow nonextenst mount option for large filesystem · c07651b5

由 Aneesh Kumar K.V 提交于 7月 11, 2008

The block mapped inode format can address only blocks within 2**32. This
causes a number of issues, the biggest of which is that the block
allocator needs to be taught that certain inodes can not utilize block
numbers > 2**32.  So until this is fixed, it is simplest to fail
mounting of file systems with more than 2**32 blocks if the -o noextents
option is given.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c07651b5

ext4: Enable delalloc by default. · dd919b98

由 Aneesh Kumar K.V 提交于 7月 11, 2008

Enable delalloc by default to ensure it gets sufficient testing and
because it makes the filesystem much more efficient.  Add a nodealalloc
option to disable delayed allocation, and update ext4_show_options to
show delayed allocation off if it is disabled.

If the data=journal mount option is used, disable delayed allocation
since the delalloc code doesn't support data=journal yet.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

dd919b98

ext4: delayed allocation i_blocks fix for stat · 3e3398a0

由 Mingming Cao 提交于 7月 11, 2008

Right now i_blocks is not getting updated until the blocks are actually
allocaed on disk.  This means with delayed allocation, right after files
are copied, "ls -sF" shoes the file as taking 0 blocks on disk.  "du"
also shows the files taking zero space, which is highly confusing to the
user.

Since delayed allocation already keeps track of per-inode total
number of blocks that are subject to delayed allocation, this patch fix
this by using that to adjust the value returned by stat(2). When real
block allocation is done, the i_blocks will get updated. Since the
reserved blocks for delayed allocation will be decreased, this will be
keep value returned by stat(2) consistent.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3e3398a0

ext4: fix delalloc i_disksize early update issue · 632eaeab

由 Mingming Cao 提交于 7月 11, 2008

Ext4_da_write_end() used walk_page_buffers() with a callback function of
ext4_bh_unmapped_or_delay() to check if it extended the file size
without allocating any blocks (since in this case i_disksize needs to be
updated). However, this is didn't work proprely because the buffer head
has not been marked dirty yet --- this is done later in
block_commit_write() --- which caused ext4_bh_unmapped_or_delay() to
always return false.

In addition, walk_page_buffers() checks all of the buffer heads covering
the page, and the only buffer_head that should be checked is the one
covering the end of the write. Otherwise, given a 1k blocksize
filesystem and a 4k page size, the buffer head covering the first 1k
stripe of the file could be unmapped (because it was a sparse file), and
the second or third buffer_head covering that page could be mapped, and
using walk_page_buffers() would fail in this case since it would stop at
the first unmapped buffer_head and return true.

The core problem is that walk_page_buffers() was intended to do work in
a callback function, and a non-zero return value indicated a failure,
which termined the walk of the buffer heads covering the page. It was
not intended to be used with a boolean function, such as
ext4_bh_unmapped_or_delay().

Add addtional fix from Aneesh to protect i_disksize update rave with truncate.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

632eaeab