提交 · d5a0d4f732af3438e592efab4cb80076d1dd81b5 · openeuler / raspberrypi-kernel

03 8月, 2008 1 次提交

ext4: fix ext4_da_write_begin error path · d5a0d4f7

由 Eric Sandeen 提交于 8月 02, 2008

ext4_da_write_begin needs to call journal_stop before returning,
if the page allocation fails.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Acked-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d5a0d4f7

27 7月, 2008 1 次提交

ext4: don't read inode block if the buffer has a write error · 9c83a923

由 Hidehiro Kawai 提交于 7月 26, 2008

A transient I/O error can corrupt inode data.  Here is the scenario:

(1) update inode_A at the block_B
(2) pdflush writes out new inode_A to the filesystem, but it results
    in write I/O error, at this point, BH_Uptodate flag of the buffer
    for block_B is cleared and BH_Write_EIO is set
(3) create new inode_C which located at block_B, and
    __ext4_get_inode_loc() tries to read on-disk block_B because the
    buffer is not uptodate
(4) if it can read on-disk block_B successfully, inode_A is
    overwritten by old data

This patch makes __ext4_get_inode_loc() not read the inode block if the
buffer has BH_Write_EIO flag.  In this case, the buffer should have the
latest information, so setting the uptodate flag to the buffer (this
avoids WARN_ON_ONCE() in mark_buffer_dirty().)

According to this change, we would need to test BH_Write_EIO flag for the
error checking.  Currently nobody checks write I/O errors on metadata
buffers, but it will be done in other patches I'm working on.
Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: sugita <yumiko.sugita.yf@hitachi.com>
Cc: Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jan Kara <jack@ucw.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

9c83a923

29 7月, 2008 1 次提交

vfs: pagecache usage optimization for pagesize!=blocksize · 8ab22b9a

由 Hisashi Hifumi 提交于 7月 28, 2008

When we read some part of a file through pagecache, if there is a
pagecache of corresponding index but this page is not uptodate, read IO
is issued and this page will be uptodate.

I think this is good for pagesize == blocksize environment but there is
room for improvement on pagesize != blocksize environment.  Because in
this case a page can have multiple buffers and even if a page is not
uptodate, some buffers can be uptodate.

So I suggest that when all buffers which correspond to a part of a file
that we want to read are uptodate, use this pagecache and copy data from
this pagecache to user buffer even if a page is not uptodate.  This can
reduce read IO and improve system throughput.

I wrote a benchmark program and got result number with this program.

This benchmark do:

  1: mount and open a test file.

  2: create a 512MB file.

  3: close a file and umount.

  4: mount and again open a test file.

  5: pwrite randomly 300000 times on a test file.  offset is aligned
     by IO size(1024bytes).

  6: measure time of preading randomly 100000 times on a test file.

The result was:
	2.6.26
        330 sec

	2.6.26-patched
        226 sec

Arch:i386
Filesystem:ext3
Blocksize:1024 bytes
Memory: 1GB

On ext3/4, a file is written through buffer/block.  So random read/write
mixed workloads or random read after random write workloads are optimized
with this patch under pagesize != blocksize environment.  This test result
showed this.

The benchmark program is as follows:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mount.h>

#define LEN 1024
#define LOOP 1024*512 /* 512MB */

main(void)
{
	unsigned long i, offset, filesize;
	int fd;
	char buf[LEN];
	time_t t1, t2;

	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
		perror("cannot mount\n");
		exit(1);
	}
	memset(buf, 0, LEN);
	fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
	if (fd < 0) {
		perror("cannot open file\n");
		exit(1);
	}
	for (i = 0; i < LOOP; i++)
		write(fd, buf, LEN);
	close(fd);
	if (umount("/root/test1/") < 0) {
		perror("cannot umount\n");
		exit(1);
	}
	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
		perror("cannot mount\n");
		exit(1);
	}
	fd = open("/root/test1/testfile", O_RDWR);
	if (fd < 0) {
		perror("cannot open file\n");
		exit(1);
	}

	filesize = LEN * LOOP;
	for (i = 0; i < 300000; i++){
		offset = (random() % filesize) & (~(LEN - 1));
		pwrite(fd, buf, LEN, offset);
	}
	printf("start test\n");
	time(&t1);
	for (i = 0; i < 100000; i++){
		offset = (random() % filesize) & (~(LEN - 1));
		pread(fd, buf, LEN, offset);
	}
	time(&t2);
	printf("%ld sec\n", t2-t1);
	close(fd);
	if (umount("/root/test1/") < 0) {
		perror("cannot umount\n");
		exit(1);
	}
}
Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@ucw.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ab22b9a

12 7月, 2008 5 次提交

ext4: delayed allocation i_blocks fix for stat · 3e3398a0

由 Mingming Cao 提交于 7月 11, 2008

Right now i_blocks is not getting updated until the blocks are actually
allocaed on disk.  This means with delayed allocation, right after files
are copied, "ls -sF" shoes the file as taking 0 blocks on disk.  "du"
also shows the files taking zero space, which is highly confusing to the
user.

Since delayed allocation already keeps track of per-inode total
number of blocks that are subject to delayed allocation, this patch fix
this by using that to adjust the value returned by stat(2). When real
block allocation is done, the i_blocks will get updated. Since the
reserved blocks for delayed allocation will be decreased, this will be
keep value returned by stat(2) consistent.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3e3398a0

ext4: fix delalloc i_disksize early update issue · 632eaeab

由 Mingming Cao 提交于 7月 11, 2008

Ext4_da_write_end() used walk_page_buffers() with a callback function of
ext4_bh_unmapped_or_delay() to check if it extended the file size
without allocating any blocks (since in this case i_disksize needs to be
updated). However, this is didn't work proprely because the buffer head
has not been marked dirty yet --- this is done later in
block_commit_write() --- which caused ext4_bh_unmapped_or_delay() to
always return false.

In addition, walk_page_buffers() checks all of the buffer heads covering
the page, and the only buffer_head that should be checked is the one
covering the end of the write. Otherwise, given a 1k blocksize
filesystem and a 4k page size, the buffer head covering the first 1k
stripe of the file could be unmapped (because it was a sparse file), and
the second or third buffer_head covering that page could be mapped, and
using walk_page_buffers() would fail in this case since it would stop at
the first unmapped buffer_head and return true.

The core problem is that walk_page_buffers() was intended to do work in
a callback function, and a non-zero return value indicated a failure,
which termined the walk of the buffer heads covering the page. It was
not intended to be used with a boolean function, such as
ext4_bh_unmapped_or_delay().

Add addtional fix from Aneesh to protect i_disksize update rave with truncate.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

632eaeab

ext4: Handle page without buffers in ext4_*_writepage() · f0e6c985

由 Aneesh Kumar K.V 提交于 7月 11, 2008

It can happen that buffers are removed from the page before it gets
marked dirty and then is passed to writepage(). In writepage() we just
initialize the buffers and check whether they are mapped and non
delay. If they are mapped and non delay we write the page. Otherwise we
mark them dirty. With this change we don't do block allocation at all
in ext4_*_write_page.

writepage() can get called under many condition and with a locking order
of journal_start -> lock_page, we should not try to allocate blocks in
writepage() which get called after taking page lock. writepage() can
get called via shrink_page_list even with a journal handle which was
created for doing inode update. For example when doing
ext4_da_write_begin we create a journal handle with credit 1 expecting a
i_disksize update for the inode. But ext4_da_write_begin can cause
shrink_page_list via _grab_page_cache. So having a valid handle via
ext4_journal_current_handle is not a guarantee that we can use the
handle for block allocation in writepage, since we shouldn't be using
credits that had been reserved for other updates. That it could result
in we running out of credits when we update inodes.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f0e6c985

ext4: Add ordered mode support for delalloc · cd1aac32

由 Aneesh Kumar K.V 提交于 7月 11, 2008

This provides a new ordered mode implementation which gets rid of using
buffer heads to enforce the ordering between metadata change with the
related data chage.  Instead, in the new ordering mode, it keeps track
of all of the inodes touched by each transaction on a list, and when
that transaction is committed, it flushes all of the dirty pages for
those inodes.  In addition, the new ordered mode reverses the lock
ordering of the page lock and transaction lock, which provides easier
support for delayed allocation.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

cd1aac32

ext4: Invert lock ordering of page_lock and transaction start in delalloc · 61628a3f

由 Mingming Cao 提交于 7月 11, 2008

With the reverse locking, we need to start a transation before taking
the page lock, so in ext4_da_writepages() we need to break the write-out
into chunks, and restart the journal for each chunck to ensure the
write-out fits in a single transaction.

Updated patch from Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
which fixes delalloc sync hang with journal lock inversion, and address
the performance regression issue.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

61628a3f

15 7月, 2008 1 次提交

ext4: delayed allocation ENOSPC handling · d2a17637

由 Mingming Cao 提交于 7月 14, 2008

This patch does block reservation for delayed
allocation, to avoid ENOSPC later at page flush time.

Blocks(data and metadata) are reserved at da_write_begin()
time, the freeblocks counter is updated by then, and the number of
reserved blocks is store in per inode counter.
        
At the writepage time, the unused reserved meta blocks are returned
back. At unlink/truncate time, reserved blocks are properly released.

Updated fix from  Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
to fix the oldallocator block reservation accounting with delalloc, added
lock to guard the counters and also fix the reservation for meta blocks.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d2a17637

12 7月, 2008 8 次提交

ext4: Add delayed allocation support in data=writeback mode · 64769240

由 Alex Tomas 提交于 7月 11, 2008

Updated with fixes from Mingming Cao <cmm@us.ibm.com> to unlock and
release the page from page cache if the delalloc write_begin failed, and
properly handle preallocated blocks.  Also added a fix to clear
buffer_delay in block_write_full_page() after allocating a delayed
buffer.

Updated with fixes from Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
to update i_disksize properly and to add bmap support for delayed
allocation.

Updated with a fix from Valerie Clement <valerie.clement@bull.net> to
avoid filesystem corruption when the filesystem is mounted with the
delalloc option and blocksize < pagesize.
Signed-off-by: NAlex Tomas <alex@clusterfs.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

64769240

ext4: Use new framework for data=ordered mode in JBD2 · 678aaf48

由 Jan Kara 提交于 7月 11, 2008

This patch makes ext4 use inode-based implementation of data=ordered mode
in JBD2. It allows us to unify some data=ordered and data=writeback paths
(especially writepage since we don't have to start a transaction anymore)
and remove some buffer walking.

Updated fix from Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
to fix file system hang due to corrupt jinode values.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

678aaf48

ext4: Invert the locking order of page_lock and transaction start · cf108bca

由 Jan Kara 提交于 7月 11, 2008

This changes are needed to support data=ordered mode handling via
inodes. This enables us to get rid of the journal heads and buffer
heads for data buffers in the ordered mode. With the changes, during
tranasaction commit we writeout the inode pages using the
writepages()/writepage(). That implies we take page lock during
transaction commit. This can cause a deadlock with the locking order
page_lock -> jbd2_journal_start, since the jbd2_journal_start can wait
for the journal_commit to happen and the journal_commit now needs to
take the page lock. To avoid this dead lock reverse the locking order.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

cf108bca

ext4: Use page_mkwrite vma_operations to get mmap write notification. · 2e9ee850

由 Aneesh Kumar K.V 提交于 7月 11, 2008

We would like to get notified when we are doing a write on mmap section.
This is needed with respect to preallocated area. We split the preallocated
area into initialzed extent and uninitialzed extent in the call back. This
let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and
that would result in data loss. The changes are also needed to handle ENOSPC
when writing to an mmap section of files with holes.
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2e9ee850

ext4: cleanup block allocator · 654b4908

由 Aneesh Kumar K.V 提交于 7月 11, 2008

Move the code related to block allocation to a single function and add helper
funtions to differient allocation for data and meta data blocks
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

654b4908

ext4: Use inode preallocation with -o noextents · 7061eba7

由 Aneesh Kumar K.V 提交于 7月 11, 2008

When mballoc is enabled, block allocation for old block-based
files are allocated using mballoc allocator instead of old
block-based allocator. The old ext3 block reservation is turned
off when mballoc is turned on.

However, the in-core preallocation is not enabled for block-based/
non-extent based file block allocation. This result in performance
regression, as now we don't have "reservation" ore in-core preallocation
to prevent interleaved fragmentation in multiple writes workload.

This patch fix this by enable per inode in-core preallocation
for non extent files when mballoc is used.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7061eba7

ext4: handle deleting corrupted indirect blocks · 71dc8fbc

由 Duane Griffin 提交于 7月 11, 2008

While freeing indirect blocks we attach a journal head to the parent buffer
head, free the blocks, then journal the parent. If the indirect block list
is corrupted and points to the parent the journal head will be detached
when the block is cleared, causing an OOPS.

Check for that explicitly and handle it gracefully.

This patch fixes the third case (image hdb.20000057.nullderef.gz)
reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882.
Signed-off-by: NDuane Griffin <duaneg@dghda.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

71dc8fbc

ext4: handle corrupted orphan list at mount · 91ef4caf

由 Duane Griffin 提交于 7月 11, 2008

If the orphan node list includes valid, untruncatable nodes with nlink > 0
the ext4_orphan_cleanup loop which attempts to delete them will not do so,
causing it to loop forever. Fix by checking for such nodes in the
ext4_orphan_get function.

This patch fixes the second case (image hdb.20000009.softlockup.gz)
reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882.
Signed-off-by: NDuane Griffin <duaneg@dghda.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

91ef4caf

30 4月, 2008 2 次提交

ext4: fix test ext_generic_write_end() copied return value · f8a87d89

由 Roel Kluin 提交于 4月 29, 2008

'copied' is unsigned, whereas 'ret2' is not. The test (copied < 0) fails
Signed-off-by: NRoel Kluin <12o3l@tiscali.nl>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f8a87d89

ext4: move headers out of include/linux · 3dcf5451

由 Christoph Hellwig 提交于 4月 29, 2008

Move ext4 headers out of include/linux.  This is just the trivial move,
there's some more thing that could be done later. 
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3dcf5451

22 4月, 2008 1 次提交

ext*: spelling fix prefered -> preferred · 1cc8dcf5

由 Benoit Boissinot 提交于 4月 21, 2008

Spelling fix: prefered -> preferred
Signed-off-by: NBenoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com>

1cc8dcf5

17 4月, 2008 2 次提交

ext4: replace remaining __FUNCTION__ occurrences · 46e665e9

由 Harvey Harrison 提交于 4月 17, 2008

__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

46e665e9

ext4: use ext4_get_group_desc() · c0a4ef38

由 Akinobu Mita 提交于 4月 17, 2008

Use ext4_get_group_desc() in ext4_get_inode_block() instead of open
coding the functionality.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: adilger@clusterfs.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mingming Cao <cmm@us.ibm.com>

c0a4ef38

29 4月, 2008 1 次提交

ext4: Fix race between migration and mmap write · 267e4db9

由 Aneesh Kumar K.V 提交于 4月 29, 2008

Fail migrate if we allocated new blocks via mmap write.

If we write to holes in the file via mmap, we end up allocating
new blocks. This block allocation happens without taking inode->i_mutex.
Since migrate is protected by i_mutex and migrate expects that no
new blocks get allocated during migrate, fail migrate if new blocks
get allocated.

We can't take inode->i_mutex in the mmap write path because that
would result in a locking order violation between i_mutex and mmap_sem.
Also adding a separate rw_sempahore for protection is really high overhead
for a rare operation such as migrate.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

267e4db9

26 2月, 2008 1 次提交

ext4: Fix BUG when writing to an unitialized extent · f5ab0d1f

由 Mingming Cao 提交于 2月 25, 2008

This patch fixes a bug when writing to preallocated but uninitialized
blocks, which resulted in a BUG in fs/buffer.c saying that the buffer
is not mapped.

When writing to a file, ext4_get_block_wrap() is called with create=1 in
order to request that blocks be allocated if necessary.  It currently
calls ext4_get_blocks() with create=0 in order to do a lookup first.  If
the inode contains an unitialized data block, the buffer head is left
unampped, which ext4_get_blocks_wrap() returns, causing the BUG.

We fix this by checking to see if the buffer head is unmapped, and if
so, we make sure the the buffer head is mapped by calling
ext4_ext_get_blocks with create=1.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f5ab0d1f

16 2月, 2008 1 次提交

ext4: modify block allocation algorithm for the last group · 74d3487f

由 Valerie Clement 提交于 2月 15, 2008

When a directory inode is allocated in the last group and the last group
contains less than s_blocks_per_group blocks, the initial block allocated
for the directory is not always allocated in the same group as the
directory inode, but in one of the first groups of the filesystem (group 1
for example).
Depending on the current process's pid, ext4_find_near() and 
ext4_ext_find_goal() can return a block number greater than the maximum
blocks count in the filesystem and in that case the block will be not
allocated in the same group as the inode.

The following patch fixes the problem.

Should the modification also be done in ext2/3 code?
Signed-off-by: NValerie Clement <valerie.clement@bull.net>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

74d3487f

26 2月, 2008 1 次提交

Remove incorrect BKL comments in ext4 · 642be6ec

由 Andi Kleen 提交于 2月 25, 2008

Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

642be6ec

10 2月, 2008 1 次提交

ext4: Fix Direct I/O locking · 7fb5409d

由 Jan Kara 提交于 2月 10, 2008

We cannot start transaction in ext4_direct_IO() and just let it last
during the whole write because dio_get_page() acquires mmap_sem which
ranks above transaction start (e.g. because we have dependency chain
mmap_sem->PageLock->journal_start, or because we update atime while
holding mmap_sem) and thus deadlocks could happen. We solve the problem
by starting a transaction separately for each ext4_get_block() call.

We *could* have a problem that we allocate a block and before its data
are written out the machine crashes and thus we expose stale data. But
that does not happen because for hole-filling generic code falls back to
buffered writes and for file extension, we add inode to orphan list and
thus in case of crash, journal replay will truncate inode back to the
original size.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7fb5409d

08 2月, 2008 1 次提交

iget: stop EXT4 from using iget() and read_inode() · 1d1fe1ee

由 David Howells 提交于 2月 07, 2008

Stop the EXT4 filesystem from using iget() and read_inode().  Replace
ext4_read_inode() with ext4_iget(), and call that instead of iget().
ext4_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.

ext4_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
Acked-by: NJan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d1fe1ee

07 2月, 2008 1 次提交

ext[234]: remove unused argument for ext[234]_find_goal() · fb01bfda

由 Akinobu Mita 提交于 2月 06, 2008

The argument chain for ext[234]_find_goal() is not used.  This patch removes
it and fixes comment as well.

Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fb01bfda

06 2月, 2008 2 次提交

allow in-inode EAs on ext4 root inode · 0040d987

由 Eric Sandeen 提交于 2月 05, 2008

The ext3 root inode was treated specially with respect
to in-inode extended attributes, for reasons detailed
in the removed comment below.  The first mkfs-created
inodes would not get extra_i_size or the EXT3_STATE_XATTR
flag set in ext3_read_inode, which disallowed reading or
setting in-inode EAs on the root.

However, in ext4, ext4_mark_inode_dirty calls
ext4_expand_extra_isize for all inodes; once this is done
EAs may be placed in the root ext4 inode body.

But for reasons above, it won't be found after a reboot.

testcase:

setfattr -n user.name -v value mntpt/
setfattr -n user.name2 -v value2 mntpt/
umount mntpt/; remount mntpt/
getfattr -d mntpt/

name2/value2 has gone missing; debugfs shows it in the
inode body, but it is not found there by getattr.

The following fixes it up; newer mkfs appears to properly
zero the inodes, so this workaround isn't needed for ext4.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

0040d987

Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user · eebd2aa3

由 Christoph Lameter 提交于 2月 04, 2008

Simplify page cache zeroing of segments of pages through 3 functions

zero_user_segments(page, start1, end1, start2, end2)

        Zeros two segments of the page. It takes the position where to
        start and end the zeroing which avoids length calculations and
	makes code clearer.

zero_user_segment(page, start, end)

        Same for a single segment.

zero_user(page, start, length)

        Length variant for the case where we know the length.

We remove the zero_user_page macro. Issues:

1. Its a macro. Inline functions are preferable.

2. The KM_USER0 macro is only defined for HIGHMEM.

   Having to treat this special case everywhere makes the
   code needlessly complex. The parameter for zeroing is always
   KM_USER0 except in one single case that we open code.

Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.

Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.

Also extract the flushing of the caches to be outside of the kmap.

[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eebd2aa3

29 1月, 2008 9 次提交

ext4: Add multi block allocator for ext4 · c9de560d

由 Alex Tomas 提交于 1月 29, 2008

Signed-off-by: NAlex Tomas <alex@clusterfs.com>
Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c9de560d

ext4: Add inode version support in ext4 · 25ec56b5

由 Jean Noel Cordenner 提交于 1月 28, 2008

This patch adds 64-bit inode version support to ext4. The lower 32 bits
are stored in the osd1.linux1.l_i_version field while the high 32 bits
are stored in the i_version_hi field newly created in the ext4_inode.
This field is incremented in case the ext4_inode is large enough. A
i_version mount option has been added to enable the feature.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
Signed-off-by: NKalpak Shah <kalpak@clusterfs.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NJean Noel Cordenner <jean-noel.cordenner@bull.net>

25ec56b5

ext4: Take read lock during overwrite case. · 4df3d265

由 Aneesh Kumar K.V 提交于 1月 28, 2008

When we are overwriting a file and not actually allocating new file system
blocks we need to take only the read lock on i_data_sem.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

4df3d265

ext4: Convert truncate_mutex to read write semaphore. · 0e855ac8

由 Aneesh Kumar K.V 提交于 1月 28, 2008

We are currently taking the truncate_mutex for every read. This would have
performance impact on large CPU configuration. Convert the lock to read write
semaphore and take read lock when we are trying to read the file.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

0e855ac8

ext4: Make ext4_get_blocks_wrap take the truncate_mutex early. · c278bfec

由 Aneesh Kumar K.V 提交于 1月 28, 2008

When doing a migrate from ext3 to ext4 inode we need to make sure the test
for inode type and walking inode data happens inside lock. To make this
happen move truncate_mutex early before checking the i_flags.

This actually should enable us to remove the verify_chain().
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

c278bfec

ext4: store maxbytes for bitmapped files and return EFBIG as appropriate · e2b46574

由 Eric Sandeen 提交于 1月 28, 2008

Calculate & store the max offset for bitmapped files, and
catch too-large seeks, truncates, and writes in ext4, shortening
or rejecting as appropriate.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>

e2b46574

ext4: Support large files · 8180a562

由 Aneesh Kumar K.V 提交于 1月 28, 2008

This patch converts ext4_inode i_blocks to represent total
blocks occupied by the inode in file system block size.
Earlier the variable used to represent this in 512 byte
block size. This actually limited the total size of the file.

The feature is enabled transparently when we write an inode
whose i_blocks cannot be represnted as 512 byte units in a
48 bit variable.

inode flag  EXT4_HUGE_FILE_FL
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

8180a562

ext4: Add support for 48 bit inode i_blocks. · 0fc1b451

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Use the __le16 l_i_reserved1 field of the linux2 struct of ext4_inode
to represet the higher 16 bits for i_blocks. With this change max_file
size becomes (2**48 -1 )* 512 bytes.

We add a RO_COMPAT feature to the super block to indicate that inode
have i_blocks represented as a split 48 bits. Super block with this
feature set cannot be mounted read write on a kernel with CONFIG_LSF
disabled.

Super block flag EXT4_FEATURE_RO_COMPAT_HUGE_FILE
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

0fc1b451

ext4: Rename i_dir_acl to i_size_high · a48380f7

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Rename ext4_inode.i_dir_acl to i_size_high
drop ext4_inode_info.i_dir_acl as it is not used
Rename ext4_inode.i_size to ext4_inode.i_size_lo
Add helper function for accessing the ext4_inode combined i_size.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

a48380f7