提交 · 26a4c0c6ccecf6814cf44f951c97222bd795bc1a · openeuler / raspberrypi-kernel

04 4月, 2013 2 次提交

ext4: refactor punch hole code · 26a4c0c6

由 Theodore Ts'o 提交于 4月 03, 2013

Move common code in ext4_ind_punch_hole() and ext4_ext_punch_hole()
into ext4_punch_hole(). This saves over 150 lines of code.

This also fixes a potential bug when the punch_hole() code is racing
against indirect-to-extents or extents-to-indirect migation. We are
currently using i_mutex to protect against changes to the inode flag;
specifically, the append-only, immutable, and extents inode flags. So
we need to take i_mutex before deciding whether to use the
extents-specific or indirect-specific punch_hole code.

Also, there was a missing call to ext4_inode_block_unlocked_dio() in
the indirect punch codepath. This was added in commit 02d262df
to block DIO readers racing against the punch operation in the
codepath for extent-mapped inodes, but it was missing for
indirect-block mapped inodes. One of the advantages of refactoring
the code is that it makes such oversights much less likely.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

26a4c0c6

ext4: fix big-endian bugs which could cause fs corruptions · 8cde7ad1

由 Zheng Liu 提交于 4月 03, 2013

When an extent was zeroed out, we forgot to do convert from cpu to le16.
It could make us hit a BUG_ON when we try to write dirty pages out.  So
fix it.

[ Also fix a bug found by Dmitry Monakhov where we were missing
  le32_to_cpu() calls in the new indirect punch hole code.

  There are a number of other big endian warnings found by static code
  analyzers, but we'll wait for the next merge window to fix them all
  up.  These fixes are designed to be Obviously Correct by code
  inspection, and easy to demonstrate that it won't make any
  difference (and hence, won't introduce any bugs) on little endian
  architectures such as x86.  --tytso ]
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reported-by: NCAI Qian <caiqian@redhat.com>
Reported-by: NChristian Kujau <lists@nerdbynature.de>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>

8cde7ad1

13 3月, 2013 1 次提交

ext4: use s_extent_max_zeroout_kb value as number of kb · 4f42f80a

由 Lukas Czerner 提交于 3月 12, 2013

Currently when converting extent to initialized, we have to decide
whether to zeroout part/all of the uninitialized extent in order to
avoid extent tree growing rapidly.

The decision is made by comparing the size of the extent with the
configurable value s_extent_max_zeroout_kb which is in kibibytes units.

However when converting it to number of blocks we currently use it as it
was in bytes. This is obviously bug and it will result in ext4 _never_
zeroout extents, but rather always split and convert parts to
initialized while leaving the rest uninitialized in default setting.

Fix this by using s_extent_max_zeroout_kb as kibibytes.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

4f42f80a

11 3月, 2013 4 次提交

ext4: update reserved space after the 'correction' · 232ec872

由 Lukas Czerner 提交于 3月 10, 2013

Currently in ext4_ext_map_blocks() in delayed allocation writeback
we would update the reservation and after that check whether we claimed
cluster outside of the range of the allocation and if so, we'll give the
block back to the reservation pool.

However this also means that if the number of reserved data block
dropped to zero before the correction, we would release all the metadata
reservation as well, however we might still need it because the we're
not done with the delayed allocation and there might be more blocks to
come. This will result in error messages such as:

EXT4-fs warning (device sdb): ext4_da_update_reserve_space:361: ino 12,
allocated 1 with only 0 reserved metadata blocks (releasing 1 blocks
with reserved 1 data blocks)

This will only happen on bigalloc file system and it can be easily
reproduced using fiemap-tester from xfstests like this:

./src/fiemap-tester -m DHDHDHDHD -S -p0 /mnt/test/file

Or using xfstests such as 225.

Fix this by doing the correction first and updating the reservation
after that so that we do not accidentally decrease
i_reserved_data_blocks to zero.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

232ec872

ext4: fix the wrong number of the allocated blocks in ext4_split_extent() · 3a225670

由 Zheng Liu 提交于 3月 10, 2013

This commit fixes a wrong return value of the number of the allocated
blocks in ext4_split_extent.  When the length of blocks we want to
allocate is greater than the length of the current extent, we return a
wrong number.  Let's see what happens in the following case when we
call ext4_split_extent().

  map: [48, 72]
  ex:  [32, 64, u]

'ex' will be split into two parts:
  ex1: [32, 47, u]
  ex2: [48, 64, w]

'map->m_len' is returned from this function, and the value is 24.  But
the real length is 16.  So it should be fixed.

Meanwhile in this commit we use right length of the allocated blocks
when get_reserved_cluster_alloc in ext4_ext_handle_uninitialized_extents
is called.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: stable@vger.kernel.org

3a225670

ext4: update extent status tree after an extent is zeroed out · adb23551

由 Zheng Liu 提交于 3月 10, 2013

When we try to split an extent, this extent could be zeroed out and mark
as initialized.  But we don't know this in ext4_map_blocks because it
only returns a length of allocated extent.  Meanwhile we will mark this
extent as uninitialized because we only check m_flags.

This commit update extent status tree when we try to split an unwritten
extent.  We don't need to worry about the status of this extent because
we always mark it as initialized.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>

adb23551

ext4: fix wrong m_len value after unwritten extent conversion · cdee7843

由 Zheng Liu 提交于 3月 10, 2013

The ext4_ext_handle_uninitialized_extents() function was assuming the
return value of ext4_ext_map_blocks() is equal to map->m_len.  This
incorrect assumption was harmless until we started use status tree as
a extent cache because we need to update status tree according to
'm_len' value.

Meanwhile this commit marks EXT4_MAP_MAPPED flag after unwritten extent
conversion.  It shouldn't cause a bug because we update status tree
according to checking EXT4_MAP_UNWRITTEN flag.  But it should be fixed.

After applied this commit, the following error message from self-testing
infrastructure disappears.

    ...
    kernel: ES len assertation failed for inode: 230 retval 1 !=
    map->m_len 3 in ext4_map_blocks (allocation)
    ...
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>

cdee7843

04 3月, 2013 4 次提交

ext4: remove unnecessary wait for extent conversion in ext4_fallocate() · de99fcce

由 Jan Kara 提交于 3月 04, 2013

Now that we don't merge uninitialized extents anymore,
ext4_fallocate() is free to operate on the inode while there are still
some extent conversions pending - it won't disturb them in any way.
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Reviewed-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

de99fcce

ext4: add warning to ext4_convert_unwritten_extents_endio · ff95ec22

由 Dmitry Monakhov 提交于 3月 04, 2013

Splitting extents inside endio is a bad thing, but unfortunately it is
still possible.  In fact we are pretty close to the moment when all
related issues will be fixed.  Let's warn developer if it still the
case.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

ff95ec22

ext4: disable merging of uninitialized extents · ec22ba8e

由 Dmitry Monakhov 提交于 3月 04, 2013

Derived from Jan's patch:http://permalink.gmane.org/gmane.comp.file-systems.ext4/36470

Merging of uninitialized extents creates all sorts of interesting race
possibilities when writeback / DIO races with fallocate. Thus
ext4_convert_unwritten_extents_endio() has to deal with a case where
extent to be converted needs to be split out first. That isn't nice
for two reasons:

1) It may need allocation of extent tree block so ENOSPC is possible.
2) It complicates end_io handling code

So we disable merging of uninitialized extents which allows us to simplify
the code. Extents will get merged after they are converted to initialized
ones.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

ec22ba8e

ext4: ext4_split_extent should take care of extent zeroout · 357b66fd

由 Dmitry Monakhov 提交于 3月 04, 2013

When ext4_split_extent_at() ends up doing zeroout & conversion to
initialized instead of split & conversion, ext4_split_extent() gets
confused and can wrongly mark the extent back as uninitialized
resulting in end IO code getting confused from large unwritten extents
and may result in data loss.

The example of problematic behavior is:
			    lblk len              lblk len
  ext4_split_extent() (ex=[1000,30,uninit], map=[1010,10])
    ext4_split_extent_at() (split [1000,30,uninit] at 1020)
      ext4_ext_insert_extent() -> ENOSPC
      ext4_ext_zeroout()
	 -> extent [1000,30] is now initialized
    ext4_split_extent_at() (split [1000,30,init] at 1010,
			     MARK_UNINIT1 | MARK_UNINIT2)
      -> extent is split and parts marked as uninitialized

Fix the problem by rechecking extent type after the first
ext4_split_extent_at() returns. None of split_flags can not be applied
to initialized extent so this patch also add BUG_ON to prevent similar
issues in future.

TESTCASE: https://github.com/dmonakhov/xfstests/commit/b8a55eb5ce28c6ff29e620ab090902fcd5833597Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

357b66fd

23 2月, 2013 1 次提交
- A
  new helper: file_inode(file) · 496ad9aa
  由 Al Viro 提交于 1月 23, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  496ad9aa
18 2月, 2013 6 次提交

ext4: remove single extent cache · 69eb33dc

由 Zheng Liu 提交于 2月 18, 2013

Single extent cache could be removed because we have extent status tree
as a extent cache, and it would be better.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>

69eb33dc

ext4: lookup block mapping in extent status tree · d100eef2

由 Zheng Liu 提交于 2月 18, 2013

After tracking all extent status, we already have a extent cache in
memory.  Every time we want to lookup a block mapping, we can first
try to lookup it in extent status tree to avoid a potential disk I/O.

A new function called ext4_es_lookup_extent is defined to finish this
work.  When we try to lookup a block mapping, we always call
ext4_map_blocks and/or ext4_da_map_blocks.  So in these functions we
first try to lookup a block mapping in extent status tree.

A new flag EXT4_GET_BLOCKS_NO_PUT_HOLE is used in ext4_da_map_blocks
in order not to put a hole into extent status tree because this hole
will be converted to delayed extent in the tree immediately.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>

d100eef2

ext4: track all extent status in extent status tree · f7fec032

由 Zheng Liu 提交于 2月 18, 2013

By recording the phycisal block and status, extent status tree is able
to track the status of every extents.  When we call _map_blocks
functions to lookup an extent or create a new written/unwritten/delayed
extent, this extent will be inserted into extent status tree.

We don't load all extents from disk in alloc_inode() because it costs
too much memory, and if a file is opened and closed frequently it will
takes too much time to load all extent information.  So currently when
we create/lookup an extent, this extent will be inserted into extent
status tree.  Hence, the extent status tree may not comprehensively
contain all of the extents found in the file.

Here a condition we need to take care is that an extent might contains
unwritten and delayed status simultaneously because an extent is delayed
allocated and could be allocated by fallocate.  At this time we need to
keep delayed status because later we need to update delayed reservation
space using it.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>

f7fec032

ext4: let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag · a25a4e1a

由 Zheng Liu 提交于 2月 18, 2013

This commit lets ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag
because in later commit ext4_map_blocks needs to use this flag to
determine the extent status.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

a25a4e1a

ext4: rename and improbe ext4_es_find_extent() · be401363

由 Zheng Liu 提交于 2月 18, 2013

This commit renames ext4_es_find_extent with ext4_es_find_delayed_extent
and improve this function.  First, we split input and output parameter.
Second, this function never return the first block of the next delayed
extent after 'es'.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>

be401363

ext4: refine extent status tree · 06b0c886

由 Zheng Liu 提交于 2月 18, 2013

This commit refines the extent status tree code.

1) A prefix 'es_' is added to to the extent status tree structure
members.

2) Refactored es_remove_extent() so that __es_remove_extent() can be
used by es_insert_extent() to remove the old extent entry(-ies) before
inserting a new one.

3) Rename extent_status_end() to ext4_es_end()

4) ext4_es_can_be_merged() is define to check whether two extents can
be merged or not.

5) Update and clarified comments.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

06b0c886

09 2月, 2013 1 次提交

ext4: pass context information to jbd2__journal_start() · 9924a92a

由 Theodore Ts'o 提交于 2月 08, 2013

So we can better understand what bits of ext4 are responsible for
long-running jbd2 handles, use jbd2__journal_start() so we can pass
context information for logging purposes.

The recommended way for finding the longer-running handles is:

   T=/sys/kernel/debug/tracing
   EVENT=$T/events/jbd2/jbd2_handle_stats
   echo "interval > 5" > $EVENT/filter
   echo 1 > $EVENT/enable

   ./run-my-fs-benchmark

   cat $T/trace > /tmp/problem-handles

This will list handles that were active for longer than 20ms.  Having
longer-running handles is bad, because a commit started at the wrong
time could stall for those 20+ milliseconds, which could delay an
fsync() or an O_SYNC operation.  Here is an example line from the
trace file describing a handle which lived on for 311 jiffies, or over
1.2 seconds:

postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
   tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
   dirtied_blocks 0
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9924a92a

29 1月, 2013 1 次提交

ext4: remove explicit WARN_ON when ext4_map_blocks() fails · b06acd38

由 Lukas Czerner 提交于 1月 28, 2013

In two places we call WARN_ON() before we print out the debug message,
however we agreed that the WARN_ON() is unnecessary at those places so
remove them.

Also use ext4_warning() instead of ext4_msg() and printk().
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b06acd38

28 1月, 2013 1 次提交

ext4: add punching hole support for non-extent-mapped files · 8bad6fc8

由 Zheng Liu 提交于 1月 28, 2013

This patch add supports for indirect file support punching hole.  It
is almost the same as ext4_ext_punch_hole.  First, we invalidate all
pages between this hole, and then we try to deallocate all blocks of
this hole.

A recursive function is used to handle deallocation of blocks.  In
this function, it iterates over the entries in inode's i_blocks or
indirect blocks, and try to free the block for each one of them.

After applying this patch, xfstest #255 will not pass w/o extent because
indirect-based file doesn't support unwritten extents.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8bad6fc8

13 1月, 2013 2 次提交

ext4: use unlikely to improve the efficiency of the kernel · aebf0243

由 Wang Shilong 提交于 1月 12, 2013

Because the function 'sb_getblk' seldomly fails to return NULL
value,it will be better to use 'unlikely' to optimize it.
Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

aebf0243

ext4: return ENOMEM if sb_getblk() fails · 860d21e2

由 Theodore Ts'o 提交于 1月 12, 2013

The only reason for sb_getblk() failing is if it can't allocate the
buffer_head.  So ENOMEM is more appropriate than EIO.  In addition,
make sure that the file system is marked as being inconsistent if
sb_getblk() fails.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

860d21e2

17 12月, 2012 1 次提交

ext4: fix extent tree corruption caused by hole punch · c36575e6

由 Forrest Liu 提交于 12月 17, 2012

When depth of extent tree is greater than 1, logical start value of
interior node is not correctly updated in ext4_ext_rm_idx.
Signed-off-by: NForrest Liu <forrestl@synology.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NAshish Sangwan <ashishsangwan2@gmail.com>
Cc: stable@vger.kernel.org

c36575e6

11 12月, 2012 4 次提交

ext4: remove unused variable from ext4_ext_in_cache() · 187fd030

由 Zhi Yong Wu 提交于 12月 10, 2012

Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: NZheng Liu <gnehzuil.liu@gmail.com>

187fd030

ext4: let fallocate handle inline data correctly · 0c8d414f

由 Tao Ma 提交于 12月 10, 2012

If we are punching hole in a file, we will return ENOTSUPP.
As for the fallocation of some extents, we will convert the
inline data to a normal extent based file first.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0c8d414f

ext4: let fiemap work with inline data · 94191985

由 Tao Ma 提交于 12月 10, 2012

fiemap is used to find the disk layout of a file, as for inline data,
let us just pretend like a file with just one extent.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

94191985

ext4: add normal write support for inline data · f19d5870

由 Tao Ma 提交于 12月 10, 2012

For a normal write case (not journalled write, not delayed
allocation), we write to the inline if the file is small and convert
it to an extent based file when the write is larger than the max
inline size.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f19d5870

29 11月, 2012 3 次提交

ext4: rationalize ext4_extents.h inclusion · 4a092d73

由 Theodore Ts'o 提交于 11月 28, 2012

Previously, ext4_extents.h was being included at the end of ext4.h,
which was bad for a number of reasons: (a) it was not being included
in the expected place, and (b) it caused the header to be included
multiple times.  There were #ifdef's to prevent this from causing any
problems, but it still was unnecessary.

By moving the function declarations that were in ext4_extents.h to
ext4.h, which is standard practice for where the function declarations
for the rest of ext4.h can be found, we can remove ext4_extents.h from
being included in ext4.h at all, and then we can only include
ext4_extents.h where it is needed in ext4's source files.

It should be possible to move a few more things into ext4.h, and
further reduce the number of source files that need to #include
ext4_extents.h, but that's a cleanup for another day.
Reported-by: NSachin Kamat <sachin.kamat@linaro.org>
Reported-by: NWei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4a092d73

ext4: simple cleanup in fiemap codepath · 06348679

由 Lukas Czerner 提交于 11月 28, 2012

This commit is simple cleanup of fiemap codepath which has not been
included in previous commit to make the changes clearer. In this commit
we rename cbex variable to newex in ext4_fill_fiemap_extents() because
callback is no longer present
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

06348679

ext4: prevent race while walking extent tree for fiemap · 91dd8c11

由 Lukas Czerner 提交于 11月 28, 2012

Currently ext4_ext_walk_space() only takes i_data_sem for read when
searching for the extent at given block with ext4_ext_find_extent().
Then it drops the lock and the extent tree can be changed at will.
However later on we're searching for the 'next' extent, but the extent
tree might already have changed, so the information might not be
accurate.

In fact we can hit BUG_ON(end <= start) if the extent got inserted into
the tree after the one we found and before the block we were searching
for. This has been reproduced by running xfstests 225 in loop on s390x
architecture, but theoretically we could hit this on any other
architecture as well, but probably not as often.

Moreover the extent currently in delayed allocation might be allocated
after we search the extent tree and before we search extent status tree
delayed buffers resulting in those delayed buffers being completely
missed, even though completely written and allocated.

We fix all those problems in several steps:

 1. remove unnecessary callback indirection
 2. rename functions
        ext4_ext_walk_space -> ext4_fill_fiemap_extents
        ext4_ext_fiemap_cb -> ext4_find_delayed_extent
 3. move fiemap_fill_next_extent() into ext4_fill_fiemap_extents()
 4. hold the i_data_sem for:
        ext4_ext_find_extent()
        ext4_ext_next_allocated_block()
        ext4_find_delayed_extent()
 5. call fiemap_fill_next_extent after releasing the i_data_sem
 6. move path reinitialization into the critical section.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

91dd8c11

09 11月, 2012 6 次提交

ext4: reimplement fiemap using extent status tree · b3aff3e3

由 Zheng Liu 提交于 11月 08, 2012

Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b3aff3e3

ext4: reimplement ext4_find_delay_alloc_range on extent status tree · 7d1b1fbc

由 Zheng Liu 提交于 11月 08, 2012

Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7d1b1fbc

ext4: let ext4 maintain extent status tree · 51865fda

由 Zheng Liu 提交于 11月 08, 2012

This patch lets ext4 maintain extent status tree.

Currently it only tracks delay extent status in extent status tree.  When a
delay allocation is issued, the related delay extent will be inserted into
extent status tree.  When a delay extent is written out or invalidated, it will
be removed from this tree.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

51865fda

ext4: fix missing call to trace_ext4_ext_map_blocks_exit · 37794732

由 Zheng Liu 提交于 11月 08, 2012

When ext4_ext_handle_uninitialized_extents(), we will directly return
from ext4_ext_map_blocks().  The trace point of
trace_ext4_ext_map_blocks_exit isn't called, and the user doesn't see
any result.  This patch tries to fix this problem.

Meanwhile in ext4_ext_handle_uninitialized_extents it returns errors
or the number of allocated blocks.  So 'ret' variable can be removed
due to previously modifications.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>

37794732

ext4: print map->m_flags in trace_ext4_ext/ind_map_blocks_exit · 19b303d8

由 Zheng Liu 提交于 11月 08, 2012

When we use trace_ext4_ext/ind_map_blocks_exit, print the value of
map->m_flags in order that we can understand the extent's current
status.
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

19b303d8

ext4: print 'flags' in ext4_ext_handle_uninitialized_extents · b5645534

由 Zheng Liu 提交于 11月 08, 2012

In trace_ext4_ext_handle_uninitialized_extents we don't care about the
value of map->m_flags because this value is probably 0, and we prefer
to get the value of flags because we can know how to handle this
extent in this function.
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b5645534

10 10月, 2012 1 次提交

ext4: race-condition protection for ext4_convert_unwritten_extents_endio · dee1f973

由 Dmitry Monakhov 提交于 10月 10, 2012

We assumed that at the time we call ext4_convert_unwritten_extents_endio()
extent in question is fully inside [map.m_lblk, map->m_len] because
it was already split during submission.  But this may not be true due to
a race between writeback vs fallocate.

If extent in question is larger than requested we will split it again.
Special precautions should being done if zeroout required because
[map.m_lblk, map->m_len] already contains valid data.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

dee1f973

05 10月, 2012 2 次提交

ext4: serialize fallocate with ext4_convert_unwritten_extents · 60d4616f

由 Dmitry Monakhov 提交于 10月 05, 2012

Fallocate should wait for pended ext4_convert_unwritten_extents()
otherwise following race may happen:

ftruncate( ,12288);
fallocate( ,0, 4096)
io_sibmit( ,0, 4096); /* Write to fallocated area, split extent if needed */
fallocate( ,0, 8192); /* Grow extent and broke assumption about extent */

Later kwork completion will do:
 ->ext4_convert_unwritten_extents (0, 4096)
   ->ext4_map_blocks(handle, inode, &map, EXT4_GET_BLOCKS_IO_CONVERT_EXT);
    ->ext4_ext_map_blocks() /* Will find new extent:  ex = [0,2] !!!!!! */
      ->ext4_ext_handle_uninitialized_extents()
        ->ext4_convert_unwritten_extents_endio()
        /* convert [0,2] extent to initialized, but only[0,1] was written */
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

60d4616f

ext4: fix ext4_flush_completed_IO wait semantics · c278531d

由 Dmitry Monakhov 提交于 10月 05, 2012

BUG #1) All places where we call ext4_flush_completed_IO are broken
    because buffered io and DIO/AIO goes through three stages
    1) submitted io,
    2) completed io (in i_completed_io_list) conversion pended
    3) finished  io (conversion done)
    And by calling ext4_flush_completed_IO we will flush only
    requests which were in (2) stage, which is wrong because:
     1) punch_hole and truncate _must_ wait for all outstanding unwritten io
      regardless to it's state.
     2) fsync and nolock_dio_read should also wait because there is
        a time window between end_page_writeback() and ext4_add_complete_io()
        As result integrity fsync is broken in case of buffered write
        to fallocated region:
        fsync                                      blkdev_completion
	 ->filemap_write_and_wait_range
                                                   ->ext4_end_bio
                                                     ->end_page_writeback
          <-- filemap_write_and_wait_range return
	 ->ext4_flush_completed_IO
   	 sees empty i_completed_io_list but pended
   	 conversion still exist
                                                     ->ext4_add_complete_io

BUG #2) Race window becomes wider due to the 'ext4: completed_io
locking cleanup V4' patch series

This patch make following changes:
1) ext4_flush_completed_io() now first try to flush completed io and when
   wait for any outstanding unwritten io via ext4_unwritten_wait()
2) Rename function to more appropriate name.
3) Assert that all callers of ext4_flush_unwritten_io should hold i_mutex to
   prevent endless wait
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

c278531d