提交 · 19b303d8b5a0e8150a4697c01ca03e75a0a17469 · openeuler / raspberrypi-kernel

09 11月, 2012 2 次提交

ext4: print map->m_flags in trace_ext4_ext/ind_map_blocks_exit · 19b303d8

由 Zheng Liu 提交于 11月 08, 2012

When we use trace_ext4_ext/ind_map_blocks_exit, print the value of
map->m_flags in order that we can understand the extent's current
status.
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

19b303d8

ext4: print 'flags' in ext4_ext_handle_uninitialized_extents · b5645534

由 Zheng Liu 提交于 11月 08, 2012

In trace_ext4_ext_handle_uninitialized_extents we don't care about the
value of map->m_flags because this value is probably 0, and we prefer
to get the value of flags because we can know how to handle this
extent in this function.
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b5645534

10 10月, 2012 1 次提交

ext4: race-condition protection for ext4_convert_unwritten_extents_endio · dee1f973

由 Dmitry Monakhov 提交于 10月 10, 2012

We assumed that at the time we call ext4_convert_unwritten_extents_endio()
extent in question is fully inside [map.m_lblk, map->m_len] because
it was already split during submission.  But this may not be true due to
a race between writeback vs fallocate.

If extent in question is larger than requested we will split it again.
Special precautions should being done if zeroout required because
[map.m_lblk, map->m_len] already contains valid data.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

dee1f973

05 10月, 2012 2 次提交

ext4: serialize fallocate with ext4_convert_unwritten_extents · 60d4616f

由 Dmitry Monakhov 提交于 10月 05, 2012

Fallocate should wait for pended ext4_convert_unwritten_extents()
otherwise following race may happen:

ftruncate( ,12288);
fallocate( ,0, 4096)
io_sibmit( ,0, 4096); /* Write to fallocated area, split extent if needed */
fallocate( ,0, 8192); /* Grow extent and broke assumption about extent */

Later kwork completion will do:
 ->ext4_convert_unwritten_extents (0, 4096)
   ->ext4_map_blocks(handle, inode, &map, EXT4_GET_BLOCKS_IO_CONVERT_EXT);
    ->ext4_ext_map_blocks() /* Will find new extent:  ex = [0,2] !!!!!! */
      ->ext4_ext_handle_uninitialized_extents()
        ->ext4_convert_unwritten_extents_endio()
        /* convert [0,2] extent to initialized, but only[0,1] was written */
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

60d4616f

ext4: fix ext4_flush_completed_IO wait semantics · c278531d

由 Dmitry Monakhov 提交于 10月 05, 2012

BUG #1) All places where we call ext4_flush_completed_IO are broken
    because buffered io and DIO/AIO goes through three stages
    1) submitted io,
    2) completed io (in i_completed_io_list) conversion pended
    3) finished  io (conversion done)
    And by calling ext4_flush_completed_IO we will flush only
    requests which were in (2) stage, which is wrong because:
     1) punch_hole and truncate _must_ wait for all outstanding unwritten io
      regardless to it's state.
     2) fsync and nolock_dio_read should also wait because there is
        a time window between end_page_writeback() and ext4_add_complete_io()
        As result integrity fsync is broken in case of buffered write
        to fallocated region:
        fsync                                      blkdev_completion
	 ->filemap_write_and_wait_range
                                                   ->ext4_end_bio
                                                     ->end_page_writeback
          <-- filemap_write_and_wait_range return
	 ->ext4_flush_completed_IO
   	 sees empty i_completed_io_list but pended
   	 conversion still exist
                                                     ->ext4_add_complete_io

BUG #2) Race window becomes wider due to the 'ext4: completed_io
locking cleanup V4' patch series

This patch make following changes:
1) ext4_flush_completed_io() now first try to flush completed io and when
   wait for any outstanding unwritten io via ext4_unwritten_wait()
2) Rename function to more appropriate name.
3) Assert that all callers of ext4_flush_unwritten_io should hold i_mutex to
   prevent endless wait
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

c278531d

01 10月, 2012 2 次提交

ext4: fix ext_remove_space for punch_hole case · 6f2080e6

由 Dmitry Monakhov 提交于 9月 30, 2012

Inode is allowed to have empty leaf only if it this is blockless inode.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6f2080e6

ext4: punch_hole should wait for DIO writers · 02d262df

由 Dmitry Monakhov 提交于 9月 30, 2012

punch_hole is the place where we have to wait for all existing writers
(writeback, aio, dio), but currently we simply flush pended end_io request
which is not sufficient. Other issue is that punch_hole performed w/o i_mutex
held which obviously result in dangerous data corruption due to
write-after-free.

This patch performs following changes:
- Guard punch_hole with i_mutex
- Recheck inode flags under i_mutex
- Block all new dio readers in order to prevent information leak caused by
  read-after-free pattern.
- punch_hole now wait for all writers in flight
  NOTE: XXX write-after-free race is still possible because new dirty pages
  may appear due to mmap(), and currently there is no easy way to stop
  writeback while punch_hole is in progress.

[ Fixed error return from ext4_ext_punch_hole() to make sure that we
  release i_mutex before returning EPERM or ETXTBUSY -- Ted ]
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

02d262df

29 9月, 2012 3 次提交

ext4: completed_io locking cleanup · 28a535f9

由 Dmitry Monakhov 提交于 9月 29, 2012

Current unwritten extent conversion state-machine is very fuzzy.
- For unknown reason it performs conversion under i_mutex. What for?
  My diagnosis:
  We already protect extent tree with i_data_sem, truncate and punch_hole
  should wait for DIO, so the only data we have to protect is end_io->flags
  modification, but only flush_completed_IO and end_io_work modified this
  flags and we can serialize them via i_completed_io_lock.

  Currently all these games with mutex_trylock result in the following deadlock
   truncate:                          kworker:
    ext4_setattr                       ext4_end_io_work
    mutex_lock(i_mutex)
    inode_dio_wait(inode)  ->BLOCK
                             DEADLOCK<- mutex_trylock()
                                        inode_dio_done()
  #TEST_CASE1_BEGIN
  MNT=/mnt_scrach
  unlink $MNT/file
  fallocate -l $((1024*1024*1024)) $MNT/file
  aio-stress -I 100000 -O -s 100m -n -t 1 -c 10 -o 2 -o 3 $MNT/file
  sleep 2
  truncate -s 0 $MNT/file
  #TEST_CASE1_END

Or use 286's xfstests https://github.com/dmonakhov/xfstests/blob/devel/286

This patch makes state machine simple and clean:

(1) xxx_end_io schedule final extent conversion simply by calling
    ext4_add_complete_io(), which append it to ei->i_completed_io_list
    NOTE1: because of (2A) work should be queued only if
    ->i_completed_io_list was empty, otherwise the work is scheduled already.

(2) ext4_flush_completed_IO is responsible for handling all pending
    end_io from ei->i_completed_io_list
    Flushing sequence consists of following stages:
    A) LOCKED: Atomically drain completed_io_list to local_list
    B) Perform extents conversion
    C) LOCKED: move converted io's to to_free list for final deletion
       	     This logic depends on context which we was called from.
    D) Final end_io context destruction
    NOTE1: i_mutex is no longer required because end_io->flags modification
    is protected by ei->ext4_complete_io_lock

Full list of changes:
- Move all completion end_io related routines to page-io.c in order to improve
  logic locality
- Move open coded logic from various xx_end_xx routines to ext4_add_complete_io()
- remove EXT4_IO_END_FSYNC
- Improve SMP scalability by removing useless i_mutex which does not
  protect io->flags anymore.
- Reduce lock contention on i_completed_io_lock by optimizing list walk.
- Rename ext4_end_io_nolock to end4_end_io and make it static
- Check flush completion status to ext4_ext_punch_hole(). Because it is
  not good idea to punch blocks from corrupted inode.

Changes since V3 (in request to Jan's comments):
  Fall back to active flush_completed_IO() approach in order to prevent
  performance issues with nolocked DIO reads.
Changes since V2:
  Fix use-after-free caused by race truncate vs end_io_work
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

28a535f9

ext4: fix unwritten counter leakage · 82e54229

由 Dmitry Monakhov 提交于 9月 28, 2012

ext4_set_io_unwritten_flag() will increment i_unwritten counter, so
once we mark end_io with EXT4_END_IO_UNWRITTEN we have to revert it back
on error path.

 - add missed error checks to prevent counter leakage
 - ext4_end_io_nolock() will clear EXT4_END_IO_UNWRITTEN flag to signal
   that conversion finished.
 - add BUG_ON to ext4_free_end_io() to prevent similar leakage in future.

Visible effect of this bug is that unaligned aio_stress may deadlock
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

82e54229

ext4: ext4_inode_info diet · f45ee3a1

由 Dmitry Monakhov 提交于 9月 28, 2012

Generic inode has unused i_private pointer which may be used as cur_aio_dio
storage.

TODO: If cur_aio_dio will be passed as an argument to get_block_t this allow
      to have concurent AIO_DIO requests.
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f45ee3a1

27 9月, 2012 2 次提交

ext4: convert to use leXX_add_cpu() · ba39ebb6

由 Wei Yongjun 提交于 9月 27, 2012

Convert cpu_to_leXX(leXX_to_cpu(E1) + E2) to use leXX_add_cpu().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ba39ebb6

ext4: remove unused function ext4_ext_check_cache · 63fedaf1

由 Lukas Czerner 提交于 9月 26, 2012

Remove unused function ext4_ext_check_cache() and merge the code back to
the ext4_ext_in_cache().
Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

63fedaf1

20 9月, 2012 1 次提交

ext4: speed up truncate/unlink by not using bforget() unless needed · 18888cf0

由 Andrey Sidorov 提交于 9月 19, 2012

Do not iterate over data blocks scanning for bh's to forget as they're
never exist. This improves time taken by unlink / truncate syscall.
Tested by continuously truncating file that is being written by dd.
Another test is rm -rf of linux tree while tar unpacks it. With
ordered data mode condition unlikely(!tbh) was always met in
ext4_free_blocks. With journal data mode tbh was found only few times,
so optimisation is also possible.

Unlinking fallocated 60G file after doing sync && echo 3 >
/proc/sys/vm/drop_caches && time rm --help

X86 before (linux 3.6-rc4):
# time rm -f test1
real    0m2.710s
user    0m0.000s
sys     0m1.530s

X86 after:
# time rm -f test1
real    0m0.644s
user    0m0.003s
sys     0m0.060s

MIPS before (linux 2.6.37):
# time rm -f test1
real    0m 4.93s
user    0m 0.00s
sys     0m 4.61s

MIPS after:
# time rm -f test1
real    0m 0.16s
user    0m 0.00s
sys     0m 0.06s
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrey Sidorov <qrxd43@motorola.com>

18888cf0

19 8月, 2012 2 次提交

ext4: fix trivial typo in comment · 30cb27d6

由 Wang Sheng-Hui 提交于 8月 18, 2012

Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

30cb27d6

ext4: no need to add inode to orphan list during hole punch · e3d2e433

由 Ashish Sangwan 提交于 8月 18, 2012

While performing punch hole for an inode, i_disksize is not changed.
So, there is no need to add the inode to orphan list.
Signed-off-by: NAshish Sangwan <ashish.sangwan2@gmail.com>
Signed-off-by: NNamjae Jeon <linkinjeon@gmail.com>
Acked-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e3d2e433

17 8月, 2012 3 次提交

ext4: make the zero-out chunk size tunable · 67a5da56

由 Zheng Liu 提交于 8月 17, 2012

Currently in ext4 the length of zero-out chunk is set to 7 file system
blocks.  But if an inode has uninitailized extents from using
fallocate to preallocate space, and the workload issues many random
writes, this can cause a fragmented extent tree that will
unnecessarily grow the extent tree.

So create a new sysfs tunable, extent_max_zeroout_kb, which controls
the maximum size where blocks will be zeroed out instead of creating a
new uninitialized extent.  The default of this has been sent to 32kb.

CC: Zach Brown <zab@zabbo.net>
CC: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

67a5da56

ext4: collapse a single extent tree block into the inode if possible · ecb94f5f

由 Theodore Ts'o 提交于 8月 17, 2012

If an inode has more than 4 extents, but then later some of the
extents are merged together, we can optimize the file system by moving
the extents up into the inode, and discarding the extent tree block.
This is important, because if there are a large number of inodes with
an external extent tree blocks where the contents could fit in the
inode, this can significantly increase the fsck time of the file
system.

Google-Bug-Id: 6801242
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ecb94f5f

ext4: fix kernel BUG on large-scale rm -rf commands · 89a4e48f

由 Theodore Ts'o 提交于 8月 17, 2012

Commit 968dee77: "ext4: fix hole punch failure when depth is greater
than 0" introduced a regression in v3.5.1/v3.6-rc1 which caused kernel
crashes when users ran run "rm -rf" on large directory hierarchy on
ext4 filesystems on RAID devices:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000028

    Process rm (pid: 18229, threadinfo ffff8801276bc000, task ffff880123631710)
    Call Trace:
     [<ffffffff81236483>] ? __ext4_handle_dirty_metadata+0x83/0x110
     [<ffffffff812353d3>] ext4_ext_truncate+0x193/0x1d0
     [<ffffffff8120a8cf>] ? ext4_mark_inode_dirty+0x7f/0x1f0
     [<ffffffff81207e05>] ext4_truncate+0xf5/0x100
     [<ffffffff8120cd51>] ext4_evict_inode+0x461/0x490
     [<ffffffff811a1312>] evict+0xa2/0x1a0
     [<ffffffff811a1513>] iput+0x103/0x1f0
     [<ffffffff81196d84>] do_unlinkat+0x154/0x1c0
     [<ffffffff8118cc3a>] ? sys_newfstatat+0x2a/0x40
     [<ffffffff81197b0b>] sys_unlinkat+0x1b/0x50
     [<ffffffff816135e9>] system_call_fastpath+0x16/0x1b
    Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00

    RIP  [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0

This could be reproduced as follows:

The problem in commit 968dee77 was that caused the variable 'i' to
be left uninitialized if the truncate required more space than was
available in the journal.  This resulted in the function
ext4_ext_truncate_extend_restart() returning -EAGAIN, which caused
ext4_ext_remove_space() to restart the truncate operation after
starting a new jbd2 handle.
Reported-by: NMaciej Żenczykowski <maze@google.com>
Reported-by: NMarti Raudsepp <marti@juffo.org>
Tested-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

89a4e48f

23 7月, 2012 1 次提交

ext4: fix hole punch failure when depth is greater than 0 · 968dee77

由 Ashish Sangwan 提交于 7月 22, 2012

Whether to continue removing extents or not is decided by the return
value of function ext4_ext_more_to_rm() which checks 2 conditions:
a) if there are no more indexes to process.
b) if the number of entries are decreased in the header of "depth -1".

In case of hole punch, if the last block to be removed is not part of
the last extent index than this index will not be deleted, hence the
number of valid entries in the extent header of "depth - 1" will
remain as it is and ext4_ext_more_to_rm will return 0 although the
required blocks are not yet removed.

This patch fixes the above mentioned problem as instead of removing
the extents from the end of file, it starts removing the blocks from
the particular extent from which removing blocks is actually required
and continue backward until done.
Signed-off-by: NAshish Sangwan <ashish.sangwan2@gmail.com>
Signed-off-by: NNamjae Jeon <linkinjeon@gmail.com>
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Cc: stable@vger.kernel.org

968dee77

10 7月, 2012 1 次提交

ext4: fix out-of-date comments in extents.c · e7bcf823

由 HaiboLiu 提交于 7月 09, 2012

In this patch, ext4_ext_try_to_merge has been change to merge 
an extent both left and right.  So we need to update the comment
in here.
Signed-off-by: NHaiboLiu <HaiboLiu6@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e7bcf823

01 7月, 2012 1 次提交

ext4: honor O_(D)SYNC semantic in ext4_fallocate() · f4e95b33

由 Zheng Liu 提交于 6月 30, 2012

Ext4 must make sure the transaction to be commited to the disk when
user opens a file with O_(D)SYNC flag and do a fallocate(2) call.

This problem had been reported by Christoph Hellwig in this thread:
http://www.spinics.net/lists/linux-btrfs/msg13621.htmlReported-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f4e95b33

01 6月, 2012 1 次提交

ext4: hole-punch use truncate_pagecache_range · 5e44f8c3

由 Hugh Dickins 提交于 6月 01, 2012

When truncating a file, we unmap pages from userspace first, as that's
usually more efficient than relying, page by page, on the fallback in
truncate_inode_page() - particularly if the file is mapped many times.

Do the same when punching a hole: 3.4 added truncate_pagecache_range()
to do the unmap and trunc, so use it in ext4_ext_punch_hole(), instead
of calling truncate_inode_pages_range() directly.
Signed-off-by: NHugh Dickins <hughd@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5e44f8c3

29 5月, 2012 1 次提交

ext4: fix format flag in ext4_ext_binsearch_idx() · 4a3c3a51

由 Zheng Liu 提交于 5月 28, 2012

fix ext_debug format flag in ext4_ext_binsearch_idx().
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4a3c3a51

30 4月, 2012 2 次提交

ext4: verify and calculate checksums for extent tree blocks · 7ac5990d

由 Darrick J. Wong 提交于 4月 29, 2012

Calculate and verify the checksum for each extent tree block.  The
checksum is located in the space immediately after the last possible
ext4_extent in the block.  The space is is typically the last 4-8
bytes in the block.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7ac5990d

ext4: create a new BH_Verified flag to avoid unnecessary metadata validation · f8489128

由 Darrick J. Wong 提交于 4月 29, 2012

Create a new BH_Verified flag to indicate that we've verified all the
data in a buffer_head for correctness.  This allows us to bypass
expensive verification steps when they are not necessary without
missing them when they are.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f8489128

17 4月, 2012 1 次提交

ext4: address scalability issue by removing extent cache statistics · 9cd70b34

由 Theodore Ts'o 提交于 4月 16, 2012

Andi Kleen and Tim Chen have reported that under certain circumstances
the extent cache statistics are causing scalability problems due to
cache line bounces.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

9cd70b34

13 4月, 2012 1 次提交

ext4: fix endianness breakage in ext4_split_extent_at() · af1584f5

由 Al Viro 提交于 4月 12, 2012

->ee_len is __le16, so assigning cpu_to_le32() to it is going to do
Bad Things(tm) on big-endian hosts...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

af1584f5

22 3月, 2012 1 次提交

ext4: remove restrictive checks for EOFBLOCKS_FL · afcff5d8

由 Lukas Czerner 提交于 3月 21, 2012

We are going to remove the EOFBLOCKS_FL flag in the future, so this is
the first part of the removal. We can not remove it entirely just now,
since the e2fsck is still checking for it and it might cause headache to
some people. Instead, remove the restrictive checks now and the rest
later, when the new e2fsck code is out and common enough.

This is also needed because punch hole already breaks the EOFBLOCKS_FL
semantics, so it might cause the some troubles. So simply remove it.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

afcff5d8

20 3月, 2012 4 次提交

T
ext4: change some printk() calls to use ext4_msg() instead · 92b97816
由 Theodore Ts'o 提交于 3月 19, 2012
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
92b97816

ext4: give more helpful error message in ext4_ext_rm_leaf() · dc1841d6

由 Lukas Czerner 提交于 3月 19, 2012

The error message produced by the ext4_ext_rm_leaf() when we are
removing blocks which accidentally ends up inside the existing extent,
is not very helpful, because we would like to also know which extent did
we collide with.

This commit changes the error message to get us also the information
about the extent we are colliding with.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

dc1841d6

ext4: remove unused code from ext4_ext_map_blocks() · 7877191c

由 Lukas Czerner 提交于 3月 19, 2012

Since the commit 'Rewrite punch hole to use ext4_ext_remove_space()'
reworked the punch hole implementation to use ext4_ext_remove_space()
instead of ext4_ext_map_blocks(), we can remove the code which is no
longer needed from the ext4_ext_map_blocks().
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7877191c

ext4: rewrite punch hole to use ext4_ext_remove_space() · 5f95d21f

由 Lukas Czerner 提交于 3月 19, 2012

This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.

Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().

Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.

And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().

This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.

ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().

Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.

This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5f95d21f

12 3月, 2012 1 次提交

ext4: check for zero length extent · 31d4f3a2

由 Theodore Ts'o 提交于 3月 11, 2012

Explicitly test for an extent whose length is zero, and flag that as a
corrupted extent.

This avoids a kernel BUG_ON assertion failure.

Tested: Without this patch, the file system image found in
tests/f_ext_zero_len/image.gz in the latest e2fsprogs sources causes a
kernel panic.  With this patch, an ext4 file system error is noted
instead, and the file system is marked as being corrupted.

https://bugzilla.kernel.org/show_bug.cgi?id=42859Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

31d4f3a2

09 1月, 2012 1 次提交

ext2/3/4: delete unneeded includes of module.h · 302bf2f3

由 Paul Gortmaker 提交于 1月 04, 2012

Delete any instances of include module.h that were not strictly
required.  In the case of ext2, the declaration of MODULE_LICENSE
etc. were in inode.c but the module_init/exit were in super.c, so
relocate the MODULE_LICENCE/AUTHOR block to super.c which makes it
consistent with ext3 and ext4 at the same time.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NJan Kara <jack@suse.cz>

302bf2f3

29 12月, 2011 1 次提交

ext4: add missing spaces to debugging printk's · 88635ca2

由 Zheng Liu 提交于 12月 28, 2011

Fix ext4_debug format in ext4_ext_handle_uninitialized_extents() and
ext4_end_io_dio().
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

88635ca2

19 12月, 2011 2 次提交

ext4: optimize ext4_find_delalloc_range() in nodelalloc mode · 8c48f7e8

由 Robin Dong 提交于 12月 18, 2011

We found performance regression when using bigalloc with "nodelalloc"
(1MB cluster size):

1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
2. mount -o nodelalloc /dev/sda /test/
3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024

The "dd" will cost about 2 seconds to finish, but if we mke2fs without
"bigalloc", "dd" will only cost less than 1 second.

The reason is: when using ext4 with "nodelalloc", it will call
ext4_find_delalloc_cluster() nearly everytime it call
ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan
all pages in cluster because no buffer is "delayed".  A cluster has
256 pages (1MB cluster), so it will scan 256 * 256k pags when creating
a 1G file. That severely hurts the performance.

Therefore, we return immediately from ext4_find_delalloc_range() in
nodelalloc mode, since by definition there can't be any delalloc
pages.
Signed-off-by: NRobin Dong <sanbai@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8c48f7e8

ext4: remove unused local variable · 14d7f3ef

由 Curt Wohlgemuth 提交于 12月 18, 2011

In get_implied_cluster_alloc(), rr_cluster_end was being
defined and set, but was never used.  Removed this.
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

14d7f3ef

14 12月, 2011 1 次提交

ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized · 5b5ffa49

由 Yongqiang Yang 提交于 12月 13, 2011

If a file is fallocated on a hole, map->m_lblk + map->m_len may be greater
than ee_block + ee_len.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

5b5ffa49

13 12月, 2011 1 次提交

ext4: Fix crash due to getting bogus eh_depth value on big-endian systems · b4611abf

由 Paul Mackerras 提交于 12月 12, 2011

Commit 1939dd84 ("ext4: cleanup ext4_ext_grow_indepth code") added a
reference to ext4_extent_header.eh_depth, but forget to pass the value
read through le16_to_cpu.  The result is a crash on big-endian
machines, such as this crash on a POWER7 server:

attempt to access beyond end of device
sda8: rw=0, want=776392648163376, limit=168558560
Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6bcb
Faulting instruction address: 0xc0000000001f5f38
cpu 0x14: Vector: 300 (Data Access) at [c000001bd1aaecf0]
    pc: c0000000001f5f38: .__brelse+0x18/0x60
    lr: c0000000002e07a4: .ext4_ext_drop_refs+0x44/0x80
    sp: c000001bd1aaef70
   msr: 9000000000009032
   dar: 6b6b6b6b6b6b6bcb
 dsisr: 40000000
  current = 0xc000001bd15b8010
  paca    = 0xc00000000ffe4600
    pid   = 19911, comm = flush-8:0
enter ? for help
[c000001bd1aaeff0] c0000000002e07a4 .ext4_ext_drop_refs+0x44/0x80
[c000001bd1aaf090] c0000000002e0c58 .ext4_ext_find_extent+0x408/0x4c0
[c000001bd1aaf180] c0000000002e145c .ext4_ext_insert_extent+0x2bc/0x14c0
[c000001bd1aaf2c0] c0000000002e3fb8 .ext4_ext_map_blocks+0x628/0x1710
[c000001bd1aaf420] c0000000002b2974 .ext4_map_blocks+0x224/0x310
[c000001bd1aaf4d0] c0000000002b7f2c .mpage_da_map_and_submit+0xbc/0x490
[c000001bd1aaf5a0] c0000000002b8688 .write_cache_pages_da+0x2c8/0x430
[c000001bd1aaf720] c0000000002b8b28 .ext4_da_writepages+0x338/0x670
[c000001bd1aaf8d0] c000000000157280 .do_writepages+0x40/0x90
[c000001bd1aaf940] c0000000001ea830 .writeback_single_inode+0xe0/0x530
[c000001bd1aafa00] c0000000001eb680 .writeback_sb_inodes+0x210/0x300
[c000001bd1aafb20] c0000000001ebc84 .__writeback_inodes_wb+0xd4/0x140
[c000001bd1aafbe0] c0000000001ebfec .wb_writeback+0x2fc/0x3e0
[c000001bd1aafce0] c0000000001ed770 .wb_do_writeback+0x2f0/0x300
[c000001bd1aafdf0] c0000000001ed848 .bdi_writeback_thread+0xc8/0x340
[c000001bd1aafed0] c0000000000c5494 .kthread+0xb4/0xc0
[c000001bd1aaff90] c000000000021f48 .kernel_thread+0x54/0x70

This is due to getting ext_depth(inode) == 0x101 and therefore running
off the end of the path array in ext4_ext_drop_refs into following
unallocated structures.

This fixes it by adding the necessary le16_to_cpu.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b4611abf

02 11月, 2011 1 次提交

ext4: let ext4_ext_rm_leaf work with EXT_DEBUG defined · bf52c6f7

由 Yongqiang Yang 提交于 11月 01, 2011

The variable 'block' is removed by commit 750c9c47, so use the
replacement ex_ee_block instead.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bf52c6f7