提交 · 7b7a8665edd8db733980389b098530f9e4f630b2 · openanolis / cloud-kernel

04 9月, 2013 1 次提交

direct-io: Implement generic deferred AIO completions · 7b7a8665

由 Christoph Hellwig 提交于 9月 04, 2013

Add support to the core direct-io code to defer AIO completions to user
context using a workqueue.  This replaces opencoded and less efficient
code in XFS and ext4 (we save a memory allocation for each direct IO)
and will be needed to properly support O_(D)SYNC for AIO.

The communication between the filesystem and the direct I/O code requires
a new buffer head flag, which is a bit ugly but not avoidable until the
direct I/O code stops abusing the buffer_head structure for communicating
with the filesystems.

Currently this creates a per-superblock unbound workqueue for these
completions, which is taken from an earlier patch by Jan Kara.  I'm
not really convinced about this use and would prefer a "normal" global
workqueue with a high concurrency limit, but this needs further discussion.

JK: Fixed ext4 part, dynamic allocation of the workqueue.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7b7a8665

17 8月, 2013 1 次提交

jbd2: Fix oops in jbd2_journal_file_inode() · a361293f

由 Jan Kara 提交于 8月 16, 2013

Commit 0713ed0c added
jbd2_journal_file_inode() call into ext4_block_zero_page_range().
However that function gets called from truncate path and thus inode
needn't have jinode attached - that happens in ext4_file_open() but
the file needn't be ever open since mount. Calling
jbd2_journal_file_inode() without jinode attached results in the oops.

We fix the problem by attaching jinode to inode also in ext4_truncate()
and ext4_punch_hole() when we are going to zero out partial blocks.
Reported-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a361293f

30 7月, 2013 1 次提交

ext4: add WARN_ON to check the length of allocated blocks · 44fb851d

由 Zheng Liu 提交于 7月 29, 2013

In commit 921f266b: ext4: add self-testing infrastructure to do a
sanity check, some sanity checks were added in map_blocks to make sure
'retval == map->m_len'.

Enable these checks by default and report any assertion failures using
ext4_warning() and WARN_ON() since they can help us to figure out some
bugs that are otherwise hard to hit.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

44fb851d

16 7月, 2013 1 次提交

ext4: call ext4_es_lru_add() after handling cache miss · 63b99968

由 Theodore Ts'o 提交于 7月 16, 2013

If there are no items in the extent status tree, ext4_es_lru_add() is
a no-op.  So it is not sufficient to call ext4_es_lru_add() before we
try to lookup an entry in the extent status tree.  We also need to
call it at the end of ext4_ext_map_blocks(), after items have been
added to the extent status tree.

This could lead to inodes with that have extent status trees but which
are not in the LRU list, which means they won't get considered for
eviction by the es_shrinker.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Zheng Liu <wenqing.lz@taobao.com>
Cc: stable@vger.kernel.org

63b99968

13 7月, 2013 1 次提交

ext4: fix spelling errors and a comment in extent_status tree · bdafe42a

由 Theodore Ts'o 提交于 7月 13, 2013

Replace "assertation" with "assertion" in lots and lots of debugging
messages.

Correct the comment stating when ext4_es_insert_extent() is used.  It
was no doubt tree at one point, but it is no longer true...
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Zheng Liu <gnehzuil.liu@gmail.com>

bdafe42a

06 7月, 2013 1 次提交

ext4: silence warning in ext4_writepages() · 27d7c4ed

由 Jan Kara 提交于 7月 05, 2013

The loop in mpage_map_and_submit_extent() is guaranteed to always run
at least once since the caller of mpage_map_and_submit_extent() makes
sure map->m_len > 0. So make that explicit using do-while instead of
pure while which also silences the compiler warning about
uninitialized 'err' variable.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NLukas Czerner <lczerner@redhat.com>

27d7c4ed

01 7月, 2013 6 次提交

ext4: fix up error handling for mpage_map_and_submit_extent() · cb530541

由 Theodore Ts'o 提交于 7月 01, 2013

The function mpage_released_unused_page() must only be called once;
otherwise the kernel will BUG() when the second call to
mpage_released_unused_page() tries to unlock the pages which had been
unlocked by the first call.

Also restructure the error handling so that we only give up on writing
the dirty pages in the case of ENOSPC where retrying the allocation
won't help.  Otherwise, a transient failure, such as a kmalloc()
failure in calling ext4_map_blocks() might cause us to give up on
those pages, leading to a scary message in /var/log/messages plus data
loss.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

cb530541

ext4: only zero partial blocks in ext4_zero_partial_blocks() · e1be3a92

由 Lukas Czerner 提交于 7月 01, 2013

Currently if we pass range into ext4_zero_partial_blocks() which covers
entire block we would attempt to zero it even though we should only zero
unaligned part of the block.

Fix this by checking whether the range covers the whole block skip
zeroing if so.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e1be3a92

ext4: check error return from ext4_write_inline_data_end() · 42c832de

由 Theodore Ts'o 提交于 7月 01, 2013

The function ext4_write_inline_data_end() can return an error.  So we
need to assign it to a signed integer variable to check for an error
return (since copied is an unsigned int).
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Zheng Liu <wenqing.lz@taobao.com>
Cc: stable@vger.kernel.org

42c832de

ext4: delete unnecessary C statements · 353eefd3

由 jon ernst 提交于 7月 01, 2013

Comparing unsigned variable with 0 always returns false.
err = 0 is duplicated and unnecessary.

[ tytso: Also cleaned up error handling in ext4_block_zero_page_range() ]
Signed-off-by: N"Jon Ernst" <jonernst07@gmx.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

353eefd3

ext4: pass inode pointer instead of file pointer to punch hole · aeb2817a

由 Ashish Sangwan 提交于 7月 01, 2013

No need to pass file pointer when we can directly pass inode pointer.
Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

aeb2817a

ext4: improve extent cache shrink mechanism to avoid to burn CPU time · d3922a77

由 Zheng Liu 提交于 7月 01, 2013

Now we maintain an proper in-order LRU list in ext4 to reclaim entries
from extent status tree when we are under heavy memory pressure.  For
keeping this order, a spin lock is used to protect this list.  But this
lock burns a lot of CPU time.  We can use the following steps to trigger
it.

  % cd /dev/shm
  % dd if=/dev/zero of=ext4-img bs=1M count=2k
  % mkfs.ext4 ext4-img
  % mount -t ext4 -o loop ext4-img /mnt
  % cd /mnt
  % for ((i=0;i<160;i++)); do truncate -s 64g $i; done
  % for ((i=0;i<160;i++)); do cp $i /dev/null &; done
  % perf record -a -g
  % perf report

This commit tries to fix this problem.  Now a new member called
i_touch_when is added into ext4_inode_info to record the last access
time for an inode.  Meanwhile we never need to keep a proper in-order
LRU list.  So this can avoid to burns some CPU time.  When we try to
reclaim some entries from extent status tree, we use list_sort() to get
a proper in-order list.  Then we traverse this list to discard some
entries.  In ext4_sb_info, we use s_es_last_sorted to record the last
time of sorting this list.  When we traverse the list, we skip the inode
that is newer than this time, and move this inode to the tail of LRU
list.  When the head of the list is newer than s_es_last_sorted, we will
sort the LRU list again.

In this commit, we break the loop if s_extent_cache_cnt == 0 because
that means that all extents in extent status tree have been reclaimed.

Meanwhile in this commit, ext4_es_{un}register_shrinker()'s prototype is
changed to save a local variable in these functions.
Reported-by: NDave Hansen <dave.hansen@intel.com>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d3922a77

07 6月, 2013 1 次提交

ext4: use ext4_da_writepages() for all modes · 20970ba6

由 Theodore Ts'o 提交于 6月 06, 2013

Rename ext4_da_writepages() to ext4_writepages() and use it for all
modes. We still need to iterate over all the pages in the case of
data=journalling, but in the case of nodelalloc/data=ordered (which is
what file systems mounted using ext3 backwards compatibility will use)
this will allow us to use a much more efficient I/O submission path.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

20970ba6

05 6月, 2013 10 次提交

ext4: remove ext4_ioend_wait() · 5dc23bdd

由 Jan Kara 提交于 6月 04, 2013

Now that we clear PageWriteback after extent conversion, there's no
need to wait for io_end processing in ext4_evict_inode().  Running
AIO/DIO keeps file reference until aio_complete() is called so
ext4_evict_inode() cannot be called.  For io_end structures resulting
from buffered IO waiting is happening because we wait for
PageWriteback in truncate_inode_pages().
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5dc23bdd

ext4: don't wait for extent conversion in ext4_punch_hole() · c724585b

由 Jan Kara 提交于 6月 04, 2013

We don't have to wait for extent conversion in ext4_punch_hole() as
buffered IO for the punched range has been flushed and waited upon
(thus all extent conversions for that range have completed).  Also we
wait for all DIO to finish using inode_dio_wait() so there cannot be
any extent conversions pending due to direct IO.

Also remove ext4_flush_unwritten_io() since it's unused now.
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c724585b

ext4: remove wait for unwritten extent conversion from ext4_truncate() · a115f749

由 Jan Kara 提交于 6月 04, 2013

Since PageWriteback bit is now cleared after extents are converted
from unwritten to written ones, we have full exclusion of writeback
path from truncate (truncate_inode_pages() waits for PageWriteback
bits to get cleared on all invalidated pages).  Exclusion from DIO
path is achieved by inode_dio_wait() call in ext4_setattr().  So
there's no need to wait for extent convertion in ext4_truncate()
anymore.
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a115f749

ext4: protect extent conversion after DIO with i_dio_count · e8340395

由 Jan Kara 提交于 6月 04, 2013

Make sure extent conversion after DIO happens while i_dio_count is
still elevated so that inode_dio_wait() waits until extent conversion
is done.  This removes the need for explicit waiting for extent
conversion in some cases.
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e8340395

ext4: use transaction reservation for extent conversion in ext4_end_io · 6b523df4

由 Jan Kara 提交于 6月 04, 2013

Later we would like to clear PageWriteback bit only after extent
conversion from unwritten to written extents is performed.  However it
is not possible to start a transaction after PageWriteback is set
because that violates lock ordering (and is easy to deadlock).  So we
have to reserve a transaction before locking pages and sending them
for IO and later we use the transaction for extent conversion from
ext4_end_io().
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6b523df4

ext4: remove buffer_uninit handling · 3613d228

由 Jan Kara 提交于 6月 04, 2013

There isn't any need for setting BH_Uninit on buffers anymore.  It was
only used to signal we need to mark io_end as needing extent
conversion in add_bh_to_extent() but now we can mark the io_end
directly when mapping extent.
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3613d228

ext4: restructure writeback path · 4e7ea81d

由 Jan Kara 提交于 6月 04, 2013

There are two issues with current writeback path in ext4.  For one we
don't necessarily map complete pages when blocksize < pagesize and
thus needn't do any writeback in one iteration.  We always map some
blocks though so we will eventually finish mapping the page.  Just if
writeback races with other operations on the file, forward progress is
not really guaranteed. The second problem is that current code
structure makes it hard to associate all the bios to some range of
pages with one io_end structure so that unwritten extents can be
converted after all the bios are finished.  This will be especially
difficult later when io_end will be associated with reserved
transaction handle.

We restructure the writeback path to a relatively simple loop which
first prepares extent of pages, then maps one or more extents so that
no page is partially mapped, and once page is fully mapped it is
submitted for IO. We keep all the mapping and IO submission
information in mpage_da_data structure to somewhat reduce stack usage.
Resulting code is somewhat shorter than the old one and hopefully also
easier to read.
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4e7ea81d

ext4: better estimate credits needed for ext4_da_writepages() · fffb2739

由 Jan Kara 提交于 6月 04, 2013

We limit the number of blocks written in a single loop of
ext4_da_writepages() to 64 when inode uses indirect blocks.  That is
unnecessary as credit estimates for mapping logically continguous run
of blocks is rather low even for inode with indirect blocks.  So just
lift this limitation and properly calculate the number of necessary
credits.

This better credit estimate will also later allow us to always write
at least a single page in one iteration.
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fffb2739

ext4: improve writepage credit estimate for files with indirect blocks · fa55a0ed

由 Jan Kara 提交于 6月 04, 2013

ext4_ind_trans_blocks() wrongly used 'chunk' argument to decide whether
blocks mapped are logically contiguous. That is wrong since the argument
informs whether the blocks are physically contiguous. As the blocks
mapped are always logically contiguous and that's all
ext4_ind_trans_blocks() cares about, just remove the 'chunk' argument.
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fa55a0ed

ext4: stop messing with nr_to_write in ext4_da_writepages() · 39bba40b

由 Jan Kara 提交于 6月 04, 2013

Writeback code got better in how it submits IO and now the number of
pages requested to be written is usually higher than original 1024.
The number is now dynamically computed based on observed throughput
and is set to be about 0.5 s worth of writeback.  E.g. on ordinary
SATA drive this ends up somewhere around 10000 as my testing shows.
So remove the unnecessary smarts from ext4_da_writepages().
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

39bba40b

04 6月, 2013 1 次提交

ext4: use io_end for multiple bios · 97a851ed

由 Jan Kara 提交于 6月 04, 2013

Change writeback path to create just one io_end structure for the
extent to which we submit IO and share it among bios writing that
extent. This prevents needless splitting and joining of unwritten
extents when they cannot be submitted as a single bio.

Bugs in ENOMEM handling found by Linux File System Verification project
(linuxtesting.org) and fixed by Alexey Khoroshilov
<khoroshilov@ispras.ru>.

CC: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

97a851ed

01 6月, 2013 1 次提交

ext4: fix overflow when counting used blocks on 32-bit architectures · 8af8eecc

由 Jan Kara 提交于 5月 31, 2013

The arithmetics adding delalloc blocks to the number of used blocks in
ext4_getattr() can easily overflow on 32-bit archs as we first multiply
number of blocks by blocksize and then divide back by 512. Make the
arithmetics more clever and also use proper type (unsigned long long
instead of unsigned long).
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

8af8eecc

28 5月, 2013 5 次提交

ext4: remove unused discard_partial_page_buffers · c121ffd0

由 Lukas Czerner 提交于 5月 27, 2013

The discard_partial_page_buffers is no longer used anywhere so we can
simply remove it including the *_no_lock variant and
EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c121ffd0

ext4: use ext4_zero_partial_blocks in punch_hole · a87dd18c

由 Lukas Czerner 提交于 5月 27, 2013

We're doing to get rid of ext4_discard_partial_page_buffers() since it is
duplicating some code and also partially duplicating work of
truncate_pagecache_range(), moreover the old implementation was much
clearer.

Now when the truncate_inode_pages_range() can handle truncating non page
aligned regions we can use this to invalidate and zero out block aligned
region of the punched out range and then use ext4_block_truncate_page()
to zero the unaligned blocks on the start and end of the range. This
will greatly simplify the punch hole code. Moreover after this commit we
can get rid of the ext4_discard_partial_page_buffers() completely.

We also introduce function ext4_prepare_punch_hole() to do come common
operations before we attempt to do the actual punch hole on
indirect or extent file which saves us some code duplication.

This has been tested on ppc64 with 1k block size with fsx and xfstests
without any problems.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

a87dd18c

Revert "ext4: fix fsx truncate failure" · eb3544c6

由 Lukas Czerner 提交于 5月 27, 2013

This reverts commit 189e868f.

This commit reintroduces the use of ext4_block_truncate_page() in ext4
truncate operation instead of ext4_discard_partial_page_buffers().

The statement in the commit description that the truncate operation only
zero block unaligned portion of the last page is not exactly right,
since truncate_pagecache_range() also zeroes and invalidate the unaligned
portion of the page. Then there is no need to zero and unmap it once more
and ext4_block_truncate_page() was doing the right job, although we
still need to update the buffer head containing the last block, which is
exactly what ext4_block_truncate_page() is doing.

Moreover the problem described in the commit is fixed more properly with
commit

15291164
	jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer

This was tested on ppc64 machine with block size of 1024 bytes without
any problems.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

eb3544c6

ext4: Call ext4_jbd2_file_inode() after zeroing block · 0713ed0c

由 Lukas Czerner 提交于 5月 27, 2013

In data=ordered mode we should call ext4_jbd2_file_inode() so that crash
after the truncate transaction has committed does not expose stall data
in the tail of the block.

Thanks Jan Kara for pointing that out.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

0713ed0c

Revert "ext4: remove no longer used functions in inode.c" · d863dc36

由 Lukas Czerner 提交于 5月 27, 2013

This reverts commit ccb4d7af.

This commit reintroduces functions ext4_block_truncate_page() and
ext4_block_zero_page_range() which has been previously removed in favour
of ext4_discard_partial_page_buffers().

In future commits we want to reintroduce those function and remove
ext4_discard_partial_page_buffers() since it is duplicating some code
and also partially duplicating work of truncate_pagecache_range(),
moreover the old implementation was much clearer.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d863dc36

22 5月, 2013 3 次提交

ext4: use ->invalidatepage() length argument · ca99fdd2

由 Lukas Czerner 提交于 5月 21, 2013

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in all ext4 invalidatepage routines.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>

ca99fdd2

jbd2: change jbd2_journal_invalidatepage to accept length · 259709b0

由 Lukas Czerner 提交于 5月 21, 2013

invalidatepage now accepts range to invalidate and there are two file
system using jbd2 also implementing punch hole feature which can benefit
from this. We need to implement the same thing for jbd2 layer in order to
allow those file system take benefit of this functionality.

This commit adds length argument to the jbd2_journal_invalidatepage()
and updates all instances in ext4 and ocfs2.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>

259709b0

mm: change invalidatepage prototype to accept length · d47992f8

由 Lukas Czerner 提交于 5月 21, 2013

Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>

d47992f8

12 5月, 2013 1 次提交

ext4: revert "ext4: use io_end for multiple bios" · a549984b

由 Theodore Ts'o 提交于 5月 11, 2013

This reverts commit 4eec708d.

Multiple users have reported crashes which is apparently caused by
this commit.  Thanks to Dmitry Monakhov for bisecting it.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Jan Kara <jack@suse.cz>

a549984b

08 5月, 2013 1 次提交

aio: don't include aio.h in sched.h · a27bb332

由 Kent Overstreet 提交于 5月 07, 2013

Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a27bb332

23 4月, 2013 1 次提交

ext4: fix type-widening bug in inode table readahead code · 0d606e2c

由 Theodore Ts'o 提交于 4月 23, 2013

Due to a missing cast, the high 32-bits of a 64-bit block number used
when calculating the readahead block for inode tables can get lost.
This means we can end up fetching the wrong blocks for readahead for
file systems > 16TB.

Linus found this when experimenting with an enhacement to the sparse
static code checker which checks for missing widening casts before
binary "not" operators.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0d606e2c

22 4月, 2013 1 次提交

ext4: mark metadata blocks using bh flags · 13fca323

由 Theodore Ts'o 提交于 4月 21, 2013

This allows metadata writebacks which are issued via block device
writeback to be sent with the current write request flags.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

13fca323

12 4月, 2013 1 次提交

ext4: use io_end for multiple bios · 4eec708d

由 Jan Kara 提交于 4月 11, 2013

Change writeback path to create just one io_end structure for the
extent to which we submit IO and share it among bios writing that
extent. This prevents needless splitting and joining of unwritten
extents when they cannot be submitted as a single bio.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NDmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>

4eec708d

10 4月, 2013 2 次提交

ext4: fix big-endian bug in metadata checksum calculations · 171a7f21

由 Dmitry Monakhov 提交于 4月 09, 2013

Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

171a7f21

ext4: introduce reserved space · 27dd4385

由 Lukas Czerner 提交于 4月 09, 2013

Currently in ENOSPC condition when writing into unwritten space, or
punching a hole, we might need to split the extent and grow extent tree.
However since we can not allocate any new metadata blocks we'll have to
zero out unwritten part of extent or punched out part of extent, or in
the worst case return ENOSPC even though use actually does not allocate
any space.

Also in delalloc path we do reserve metadata and data blocks for the
time we're going to write out, however metadata block reservation is
very tricky especially since we expect that logical connectivity implies
physical connectivity, however that might not be the case and hence we
might end up allocating more metadata blocks than previously reserved.
So in future, metadata reservation checks should be removed since we can
not assure that we do not under reserve.

And this is where reserved space comes into the picture. When mounting
the file system we slice off a little bit of the file system space (2%
or 4096 clusters, whichever is smaller) which can be then used for the
cases mentioned above to prevent costly zeroout, or unexpected ENOSPC.

The number of reserved clusters can be set via sysfs, however it can
never be bigger than number of free clusters in the file system.

Note that this patch fixes the failure of xfstest 274 as expected.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>

27dd4385

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功