提交 · 8aefcd557d26d0023a36f9ec5afbf55e59f8f26b · openeuler / raspberrypi-kernel

11 1月, 2011 4 次提交

ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary · 8aefcd55

由 Theodore Ts'o 提交于 1月 10, 2011

Replace the jbd2_inode structure (which is 48 bytes) with a pointer
and only allocate the jbd2_inode when it is needed --- that is, when
the file system has a journal present and the inode has been opened
for writing.  This allows us to further slim down the ext4_inode_info
structure.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8aefcd55

ext4: drop i_state_flags on architectures with 64-bit longs · 353eb83c

由 Theodore Ts'o 提交于 1月 10, 2011

We can store the dynamic inode state flags in the high bits of
EXT4_I(inode)->i_flags, and eliminate i_state_flags.  This saves 8
bytes from the size of ext4_inode_info structure, which when
multiplied by the number of the number of in the inode cache, can save
a lot of memory.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

353eb83c

ext4: use ext4_lblk_t instead of sector_t for logical blocks · 01f49d0b

由 Theodore Ts'o 提交于 1月 10, 2011

This fixes a number of places where we used sector_t instead of
ext4_lblk_t for logical blocks, which for ext4 are still 32-bit data
types.  No point wasting space in the ext4_inode_info structure, and
requiring 64-bit arithmetic on 32-bit systems, when it isn't
necessary.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

01f49d0b

ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED · f2321097

由 Theodore Ts'o 提交于 1月 10, 2011

Remove the short element i_delalloc_reserved_flag from the
ext4_inode_info structure and replace it a new bit in i_state_flags.
Since we have an ext4_inode_info for every ext4 inode cached in the
inode cache, any savings we can produce here is a very good thing from
a memory utilization perspective.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f2321097

17 12月, 2010 2 次提交

ext4: Use pr_warning_ratelimited() instead of printk_ratelimit() · a8901d34

由 Theodore Ts'o 提交于 12月 17, 2010

printk_ratelimit() is deprecated since it is a global instead of a
per-printk ratelimit.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a8901d34

ext4: Fix up comments in inode.c · 225db7d3

由 Theodore Ts'o 提交于 12月 16, 2010

This fixes up some broken argument descriptions that Namhyung Kim had
originally submitted for ext3.  This fixes the comments that were
still applicable in ext4.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

225db7d3

15 12月, 2010 1 次提交

ext4: Turn off multiple page-io submission by default · 1449032b

由 Theodore Ts'o 提交于 12月 14, 2010

Jon Nelson has found a test case which causes postgresql to fail with
the error:

psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581

Under memory pressure, it looks like part of a file can end up getting
replaced by zero's.  Until we can figure out the cause, we'll roll
back the change and use block_write_full_page() instead of
ext4_bio_write_page().  The new, more efficient writing function can
be used via the mount option mblk_io_submit, so we can test and fix
the new page I/O code.

To reproduce the problem, install postgres 8.4 or 9.0, and pin enough
memory such that the system just at the end of triggering writeback
before running the following sql script:

begin;
create temporary table foo as select x as a, ARRAY[x] as b FROM
generate_series(1, 10000000 ) AS x;
create index foo_a_idx on foo (a);
create index foo_b_idx on foo USING GIN (b);
rollback;

If the temporary table is created on a hard drive partition which is
encrypted using dm_crypt, then under memory pressure, approximately
30-40% of the time, pgsql will issue the above failure.

This patch should fix this problem, and the problem will come back if
the file system is mounted with the mblk_io_submit mount option.
Reported-by: NJon Nelson <jnelson@jamponi.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1449032b

09 11月, 2010 1 次提交

ext4: Add new ext4 inode tracepoints · 7ff9c073

由 Theodore Ts'o 提交于 11月 08, 2010

Add ext4_evict_inode, ext4_drop_inode, ext4_mark_inode_dirty, and
ext4_begin_ordered_truncate()
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7ff9c073

02 11月, 2010 1 次提交

ext4: Remove useless spinlock in ext4_getattr() · eb8abb92

由 Theodore Ts'o 提交于 11月 02, 2010

Linus noted, and complained to me, that doing while lots of "git diff"'s
of kernel sources, these spinlocks were responsible for 27% of the
spinlock cost on his two-processor system as reported by perf.

Git was doing lots of parallel stats, and this was putting a lot of
pressure on ext4_getattr().  A spinlock to protect a single
memory-to-memory copy is pointless, so remove it.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eb8abb92

29 10月, 2010 1 次提交

ext4: BUG_ON fix: check if page has buffers before calling page_buffers() · b1142e8f

由 Theodore Ts'o 提交于 10月 28, 2010

We need to make check if a page does not have buffes by checking
page_has_buffers(page) before calling page_buffers(page) in
ext4_writepage().  Otherwise page_buffers() could throw a BUG_ON.

Thanks also to Markus Trippelsdorf and Avinash Kurup who also reported
the problem.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reported-by: NSedat Dilek <sedat.dilek@googlemail.com>
Tested-by: NSedat Dilek <sedat.dilek@googlemail.com>

b1142e8f

28 10月, 2010 17 次提交

ext4: optimize orphan_list handling for ext4_setattr · 3d287de3

由 Dmitry Monakhov 提交于 10月 27, 2010

Surprisingly chown() on ext4 is not SMP scalable operation. 
Due to unconditional orphan_del(NULL, inode) in ext4_setattr()
result in significant performance overhead because of global orphan
mutex, especially in no-journal mode (where orphan_add() is noop).
It is possible to skip explicit orphan_del if possible.
Results of fchown() micro-benchmark in no-journal mode
while (1) {
   iteration++;
   fchown(fd, uid, gid);
   fchown(fd, uid + 1, gid + 1)
}
measured: iterations per millisecond
| nr_tasks | w/o patch | with patch |
|        1 |       142 |        185 |
|        4 |       109 |        642 |
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3d287de3

ext4: move flush_completed_IO to fs/ext4/fsync.c and make it static · 4a873a47

由 Theodore Ts'o 提交于 10月 27, 2010

Fix a namespace leak by moving the function to the file where it is
used and making it static.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4a873a47

ext4: make various ext4 functions be static · 1f109d5a

由 Theodore Ts'o 提交于 10月 27, 2010

These functions have no need to be exported beyond file context.

No functions needed to be moved for this commit; just some function
declarations changed to be static and removed from header files.

(A similar patch was submitted by Eric Sandeen, but I wanted to handle
code movement in separate patches to make sure code changes didn't
accidentally get dropped.)
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1f109d5a

ext4: update writeback_index based on last page scanned · 72f84e65

由 Eric Sandeen 提交于 10月 27, 2010

As pointed out in a prior patch, updating the mapping's
writeback_index based on pages written isn't quite right;
what the writeback index is really supposed to reflect is
the next page which should be scanned for writeback during
periodic flush.

As in write_cache_pages(), write_cache_pages_da() does
this scanning for us as we assemble the mpd for later
writeout.  If we keep track of the next page after the
current scan, we can easily update writeback_index without
worrying about pages written vs. pages skipped, etc.

Without this, an fsync will reset writeback_index to
0 (its starting index) + however many pages it wrote, which
can mess up the progress of periodic flush.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

72f84e65

ext4: implement writeback livelock avoidance using page tagging · 5b41d924

由 Eric Sandeen 提交于 10月 27, 2010

This is analogous to Jan Kara's commit,
f446daae
mm: implement writeback livelock avoidance using page tagging

but since we forked write_cache_pages, we need to reimplement
it there (and in ext4_da_writepages, since range_cyclic handling
was moved to there)

If you start a large buffered IO to a file, and then set
fsync after it, you'll find that fsync does not complete
until the other IO stops.

If you continue re-dirtying the file (say, putting dd
with conv=notrunc in a loop), when fsync finally completes
(after all IO is done), it reports via tracing that
it has written many more pages than the file contains;
in other words it has synced and re-synced pages in
the file multiple times.

This then leads to problems with our writeback_index
update, since it advances it by pages written, and
essentially sets writeback_index off the end of the
file...

With the following patch, we only sync as much as was
dirty at the time of the sync.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5b41d924

ext4: tidy up a void argument in inode.c · bbd08344

由 Eric Sandeen 提交于 10月 27, 2010

This doesn't fix anything at all, it just removes a vestige
of prior use from __mpage_da_writepage()

__mpage_da_writepage() had a *void argument leftover from
its previous life as a callback; make it reflect the actual type.

Fixing this up makes it slightly more obvious to read, and 
enables proper typechecking.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bbd08344

ext4: Check return value of sb_getblk() and friends · 87783690

由 Namhyung Kim 提交于 10月 27, 2010

Fail block allocation if sb_getblk() returns NULL. In that case,
sb_find_get_block() also likely to fail so that it should skip
calling ext4_forget().
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

87783690

ext4: use bio layer instead of buffer layer in mpage_da_submit_io · bd2d0210

由 Theodore Ts'o 提交于 10月 27, 2010

Call the block I/O layer directly instad of going through the buffer
layer. This should give us much better performance and scalability,
as well as lowering our CPU utilization when doing buffered writeback.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bd2d0210

ext4: move mpage_put_bnr_to_bhs()'s functionality to mpage_da_submit_io() · 1de3e3df

由 Theodore Ts'o 提交于 10月 27, 2010

This massively simplifies the ext4_da_writepages() code path by
completely removing mpage_put_bnr_bhs(), which is almost 100 lines of
code iterating over a set of pages using pagevec_lookup(), and folds
that functionality into mpage_da_submit_io()'s existing
pagevec_lookup() loop.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1de3e3df

ext4: inline walk_page_buffers() into mpage_da_submit_io · 3ecdb3a1

由 Theodore Ts'o 提交于 10月 27, 2010

Expand the call:

  if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
                        ext4_bh_delay_or_unwritten))
	goto redirty_page

into mpage_da_submit_io().

This will allow us to merge in mpage_put_bnr_to_bhs() in the next
patch.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3ecdb3a1

ext4: inline ext4_writepage() into mpage_da_submit_io() · cb20d518

由 Theodore Ts'o 提交于 10月 27, 2010

As a prepratory step to switching to bio_submit, inline
ext4_writepage() into mpage_da_submit() and then simplify things a
bit.  This makes it clearer what mpage_da_submit needs to do.

Also, move the ClearPageChecked(page) call into
__ext4_journalled_writepage(), as a minor bit of cleanup refactoring.

This also allows us to pull i_size_read() and
ext4_should_journal_data() out of the loop, which should be a very
minor CPU savings.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

cb20d518

ext4: simplify ext4_writepage() · a42afc5f

由 Theodore Ts'o 提交于 10月 27, 2010

The actual code in ext4_writepage() is unnecessarily convoluted.
Simplify it so it is easier to understand, but otherwise logically
equivalent.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a42afc5f

ext4: call mpage_da_submit_io() from mpage_da_map_blocks() · 5a87b7a5

由 Theodore Ts'o 提交于 10月 27, 2010

Eventually we need to completely reorganize the ext4 writepage
callpath, but for now, we simplify things a little by calling
mpage_da_submit_io() from mpage_da_map_blocks(), since all of the
places where we call mpage_da_map_blocks() it is followed up by a call
to mpage_da_submit_io().

We're also a wee bit better with respect to error handling, but there
are still a number of issues where it's not clear what the right thing
is to do with ext4 functions deep in the writeback codepath fails.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5a87b7a5

ext4: queue conversion after adding to inode's completed IO list · c999af2b

由 Eric Sandeen 提交于 10月 27, 2010

By queuing the io end on the unwritten workqueue before adding it
to our inode's list of completed IOs, I think we run the risk
of the work getting completed, and the IO freed, before we try
to add it to the inode's i_completed_io_list.

It should be safe to add it to the inode's list of completed
IOs, and -then- queue it for completion, I think.

Thanks to Dave Chinner for pointing out the race.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NJiaying Zhang <jiayingz@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c999af2b

ext4: fix potential infinite loop in ext4_da_writepages() · 0c9169cc

由 Toshiyuki Okajima 提交于 10月 27, 2010

On linux-2.6.36-rc2, if we execute the following script, we can hang
the system when the /bin/sync command is executed:

========================================================================
#!/bin/sh

echo -n "HANG UP TEST: "
/bin/dd if=/dev/zero of=/tmp/img bs=1k count=1 seek=1M 2> /dev/null
/sbin/mkfs.ext4 -Fq /tmp/img
/bin/mount -o loop -t ext4 /tmp/img /mnt
/bin/dd if=/dev/zero of=/mnt/file bs=1 count=1 \
seek=$((16*1024*1024*1024*1024-4096)) 2> /dev/null
/bin/sync
/bin/umount /mnt
echo "DONE"
exit 0
========================================================================

We can see the following backtrace if we get the kdump when this
hangup occurs:

======================================================================
kthread()
=> bdi_writeback_thread()
   => wb_do_writeback()
      => wb_writeback()
         => writeback_inodes_wb()
            => writeback_sb_inodes()
               => writeback_single_inode()
                  => ext4_da_writepages()  ---+ 
                                ^ infinite    |
                                |   loop      |
                                +-------------+
======================================================================

The reason why this hangup happens is described as follows:
1) We write the last extent block of the file whose size is the filesystem 
   maximum size.
2) "BH_Delay" flag is set on the buffer_head of its block.
3) - the member, "m_lblk" of struct mpage_da_data is 4294967295 (UINT_MAX)
   - the member, "m_len" of struct mpage_da_data is 1
  mpage_put_bnr_to_bhs() which is called via ext4_da_writepages()
  cannot clear "BH_Delay" flag of the buffer_head because the type of
  m_lblk is ext4_lblk_t and then m_lblk + m_len is overflow.

  Therefore an infinite loop occurs because ext4_da_writepages()
  cannot write the page (which corresponds to the block) since
  "BH_Delay" flag isn't cleared.
----------------------------------------------------------------------
static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd,
				struct ext4_map_blocks *map)
{
...
	int blocks = map->m_len;
...
		do {
			// cur_logical = 4294967295
			// map->m_lblk = 4294967295
			// blocks = 1
			// *** map->m_lblk + blocks == 0 (OVERFLOW!) ***
			// (cur_logical >= map->m_lblk + blocks) => true
			if (cur_logical >= map->m_lblk + blocks)
				break;
----------------------------------------------------------------------

NOTE: Mounting with the nodelalloc option will avoid this codepath,
and thus, avoid this hang
Signed-off-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0c9169cc

ext4: don't bump up LONG_MAX nr_to_write by a factor of 8 · b443e733

由 Eric Sandeen 提交于 10月 27, 2010

I'm uneasy with lots of stuff going on in ext4_da_writepages(),
but bumping nr_to_write from LLONG_MAX to -8 clearly isn't
making anything better, so avoid the multiplier in that case.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b443e733

ext4: stop looping in ext4_num_dirty_pages when max_pages reached · 659c6009

由 Eric Sandeen 提交于 10月 27, 2010

Today we simply break out of the inner loop when we have accumulated
max_pages; this keeps scanning forwad and doing pagevec_lookup_tag()
in the while (!done) loop, this does potentially a lot of work
with no net effect.

When we have accumulated max_pages, just clean up and return.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

659c6009

26 10月, 2010 1 次提交

fs: kill block_prepare_write · ebdec241

由 Christoph Hellwig 提交于 10月 06, 2010

__block_write_begin and block_prepare_write are identical except for slightly
different calling conventions.  Convert all callers to the __block_write_begin
calling conventions and drop block_prepare_write.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ebdec241

10 8月, 2010 4 次提交

A
convert ext4 to ->evict_inode() · 0930fcc1
由 Al Viro 提交于 6月 07, 2010
```
pretty much brute-force...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
0930fcc1

remove inode_setattr · 1025774c

由 Christoph Hellwig 提交于 6月 04, 2010

Replace inode_setattr with opencoded variants of it in all callers.  This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.

In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:

 spufs: explicitly checks for ATTR_SIZE earlier
 btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
 ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1025774c

introduce __block_write_begin · 6e1db88d

由 Christoph Hellwig 提交于 6月 04, 2010

Split up the block_write_begin implementation - __block_write_begin is a new
trivial wrapper for block_prepare_write that always takes an already
allocated page and can be either called from block_write_begin or filesystem
code that already has a page allocated.  Remove the handling of already
allocated pages from block_write_begin after switching all callers that
do it to __block_write_begin.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6e1db88d

sort out blockdev_direct_IO variants · eafdc7d1

由 Christoph Hellwig 提交于 6月 04, 2010

Move the call to vmtruncate to get rid of accessive blocks to the callers
in prepearation of the new truncate calling sequence. This was only done
for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
its _newtrunc variant while at it as just opencoding the two additional
paramters is shorted than the name suffix.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eafdc7d1

06 8月, 2010 1 次提交

ext4: Fix dirtying of journalled buffers in data=journal mode · 56d35a4c

由 Jan Kara 提交于 8月 05, 2010

In data=journal mode, we still use block_write_begin() to prepare
page for writing. This function can occasionally mark buffer dirty
which violates journalling assumptions - when a buffer is part of
a transaction, it should be dirty and a buffer can be already part
of a forget list of some transaction when block_write_begin()
gets called. This violation of journalling assumptions then results
in "JBD: Spotted dirty metadata buffer..." warnings.

In fact, temporary dirtying the buffer while the page is still locked
does not really cause problems to the journalling because we won't write
the buffer until the page gets unlocked. So we just have to make sure
to clear dirty bits before unlocking the page.
Signed-off-by: NJan Kara <jack@suse.cz>

56d35a4c

04 8月, 2010 1 次提交

jbd2: Change j_state_lock to be a rwlock_t · a931da6a

由 Theodore Ts'o 提交于 8月 03, 2010

Lockstat reports have shown that j_state_lock is a major source of
lock contention, especially on systems with more than 4 CPU cores.  So
change it to be a read/write spinlock.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a931da6a

30 7月, 2010 1 次提交

ext4: drop inode from orphan list if ext4_delete_inode() fails · 45388219

由 Theodore Ts'o 提交于 7月 29, 2010

There were some error paths in ext4_delete_inode() which was not
dropping the inode from the orphan list.  This could lead to a BUG_ON
on umount when the orphan list is discovered to be non-empty.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

45388219

27 7月, 2010 5 次提交

ext4: don't print scary messages for allocation failures post-abort · e3570639

由 Eric Sandeen 提交于 7月 27, 2010

I often get emails containing the "This should not happen!!" message,
conveniently trimmed to remove things like:

sd 0:0:0:0: [sda] Unhandled error code
sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 03 13 c9 70 00 00 28 00
end_request: I/O error, dev sda, sector 51628400
Aborting journal on device dm-0-8.
EXT4-fs error (device dm-0): ext4_journal_start_sb: Detected aborted journal
EXT4-fs (dm-0): Remounting filesystem read-only

I don't think there is any value to the verbosity if the reason is
due to a filesystem abort; it just obfuscates the root cause.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e3570639

ext4: fix ext4_get_blocks references · 79e83036

由 Eric Sandeen 提交于 7月 27, 2010

ext4_get_blocks got renamed to ext4_map_blocks, but left stale
comments and a prototype littered around.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

79e83036

ext4: Don't error out the fs if the user tries to make a file too big · 0c095c7f

由 Theodore Ts'o 提交于 7月 27, 2010

If the user attempts to make a non-extent-mapped file to be too large,
return EFBIG, but don't call ext4_std_err() which will end up marking
the file system as containing an error.

Thanks to Toshiyuki Okajima-san at Fujitsu for pointing this out.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0c095c7f

ext4: move aio completion after unwritten extent conversion · 5b3ff237

由 jiayingz@google.com (Jiaying Zhang) 提交于 7月 27, 2010

This patch is to be applied upon Christoph's "direct-io: move aio_complete
into ->end_io" patch. It adds iocb and result fields to struct ext4_io_end_t,
so that we can call aio_complete from ext4_end_io_nolock() after the extent
conversion has finished.

I have verified with Christoph's aio-dio test that used to fail after a few
runs on an original kernel but now succeeds on the patched kernel.

See http://thread.gmane.org/gmane.comp.file-systems.ext4/19659 for details.
Signed-off-by: NJiaying Zhang <jiayingz@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5b3ff237

direct-io: move aio_complete into ->end_io · 552ef802

由 Christoph Hellwig 提交于 7月 27, 2010

Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited.  That means
the aio_complete calls needs to be moved into the ->end_io callback so
that the filesystem can control when to call it exactly.

This makes a bit of a mess out of dio_complete and the ->end_io callback
prototype even more complicated. 
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz> 
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

552ef802