提交 · 72b8e0f9fa8aee7e623808af1a5f33b70ebcb2c7 · openeuler / Kernel

03 4月, 2015 4 次提交

ext4: remove unused header files · 72b8e0f9

由 Sheng Yong 提交于 4月 02, 2015

Remove unused header files and header files which are included in
ext4.h.
Signed-off-by: NSheng Yong <shengyong1@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

72b8e0f9

ext4: fix comments in ext4_can_extents_be_merged() · 4255c224

由 Xiaoguang Wang 提交于 4月 02, 2015

Since commit a9b82415, we are allowed to merge unwritten extents,
so here these comments are wrong, remove it.
Signed-off-by: NXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4255c224

ext4: fix transposition typo in format string · 80cfb71e

由 Rasmus Villemoes 提交于 4月 02, 2015

According to C99, %*.s means the same as %*.0s, in other words, print as
many spaces as the field width argument says and effectively ignore the
string argument. That is certainly not what was meant here. The kernel's
printf implementation, however, treats it as if the . was not there,
i.e. as %*s. I don't know if de->name is nul-terminated or not, but in
any case I'm guessing the intention was to use de->name_len as precision
instead of field width.

[ Note: this is debugging code which is commented out, so this is not
  security issue; a developer would have to explicitly enable
  INLINE_DIR_DEBUG before this would be an issue. ]
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

80cfb71e

ext4: fix bh leak on error paths in ext4_rename() and ext4_cross_rename() · 7071b715

由 Konstantin Khlebnikov 提交于 4月 02, 2015

Release references to buffer-heads if ext4_journal_start() fails.

Fixes: 5b61de75 ("ext4: start handle at least possible moment when renaming files")
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

7071b715

17 2月, 2015 1 次提交

ext4: add DAX functionality · 923ae0ff

由 Ross Zwisler 提交于 2月 16, 2015

This is a port of the DAX functionality found in the current version of
ext2.

[matthew.r.wilcox@intel.com: heavily tweaked]
[akpm@linux-foundation.org: remap_pages went away]
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NAndreas Dilger <andreas.dilger@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

923ae0ff

15 2月, 2015 1 次提交

ext4: fix indirect punch hole corruption · 6f30b7e3

由 Omar Sandoval 提交于 2月 14, 2015

Commit 4f579ae7 (ext4: fix punch hole on files with indirect
mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
mapping. However, there are bugs in several corner cases. This fixes 5
distinct bugs:

1. When there is at least one entire level of indirection between the
start and end of the punch range and the end of the punch range is the
first block of its level, we can't return early; we have to free the
intervening levels.

2. When the end is at a higher level of indirection than the start and
ext4_find_shared returns a top branch for the end, we still need to free
the rest of the shared branch it returns; we can't decrement partial2.

3. When a punch happens within one level of indirection, we need to
converge on an indirect block that contains the start and end. However,
because the branches returned from ext4_find_shared do not necessarily
start at the same level (e.g., the partial2 chain will be shallower if
the last block occurs at the beginning of an indirect group), the walk
of the two chains can end up "missing" each other and freeing a bunch of
extra blocks in the process. This mismatch can be handled by first
making sure that the chains are at the same level, then walking them
together until they converge.

4. When the punch happens within one level of indirection and
ext4_find_shared returns a top branch for the start, we must free it,
but only if the end does not occur within that branch.

5. When the punch happens within one level of indirection and
ext4_find_shared returns a top branch for the end, then we shouldn't
free the block referenced by the end of the returned chain (this mirrors
the different levels case).
Signed-off-by: NOmar Sandoval <osandov@osandov.com>

6f30b7e3

13 2月, 2015 4 次提交

ext4: ignore journal checksum on remount; don't fail · 2d5b86e0

由 Eric Sandeen 提交于 2月 12, 2015

As of v3.18, ext4 started rejecting a remount which changes the
journal_checksum option.

Prior to that, it was simply ignored; the problem here is that
if someone has this in their fstab for the root fs, now the box
fails to boot properly, because remount of root with the new options
will fail, and the box proceeds with a readonly root.

I think it is a little nicer behavior to accept the option, but
warn that it's being ignored, rather than failing the mount,
but that might be a subjective matter...
Reported-by: NCónräd <conradsand.arma@gmail.com>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2d5b86e0

ext4: remove duplicate remount check for JOURNAL_CHECKSUM change · b94a8b36

由 Eric Sandeen 提交于 2月 12, 2015

rejection of, changing journal_checksum during remount.  One suffices.

While we're at it, remove old comment about the "check" option
which has been deprecated for some time now.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b94a8b36

ext4: fix mmap data corruption in nodelalloc mode when blocksize < pagesize · 0572639f

由 Xiaoguang Wang 提交于 2月 12, 2015

Since commit 90a80202 and d6320cbf, Jan Kara has fixed this issue partially.
This mmap data corruption still exists in nodelalloc mode, fix this.
Signed-off-by: NXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

0572639f

ext4: support read-only images · 2cb5cc8b

由 Darrick J. Wong 提交于 2月 12, 2015

Add a rocompat feature, "readonly" to mark a FS image as read-only.
The feature prevents the kernel and e2fsprogs from changing the image;
the flag can be toggled by tune2fs.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2cb5cc8b

11 2月, 2015 1 次提交

mm: drop vm_ops->remap_pages and generic_file_remap_pages() stub · d83a08db

由 Kirill A. Shutemov 提交于 2月 10, 2015

Nobody uses it anymore.

[akpm@linux-foundation.org: fix filemap_xip.c]
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d83a08db

05 2月, 2015 2 次提交

ext4: add optimization for the lazytime mount option · a26f4992

由 Theodore Ts'o 提交于 2月 02, 2015

Add an optimization for the MS_LAZYTIME mount option so that we will
opportunistically write out any inodes with the I_DIRTY_TIME flag set
in a particular inode table block when we need to update some inode in
that inode table block anyway.

Also add some temporary code so that we can set the lazytime mount
option without needing a modified /sbin/mount program which can set
MS_LAZYTIME.  We can eventually make this go away once util-linux has
added support.

Google-Bug-Id: 18297052
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a26f4992

vfs: add support for a lazytime mount option · 0ae45f63

由 Theodore Ts'o 提交于 2月 02, 2015

Add a new mount option which enables a new "lazytime" mode.  This mode
causes atime, mtime, and ctime updates to only be made to the
in-memory version of the inode.  The on-disk times will only get
updated when (a) if the inode needs to be updated for some non-time
related change, (b) if userspace calls fsync(), syncfs() or sync(), or
(c) just before an undeleted inode is evicted from memory.

This is OK according to POSIX because there are no guarantees after a
crash unless userspace explicitly requests via a fsync(2) call.

For workloads which feature a large number of random write to a
preallocated file, the lazytime mount option significantly reduces
writes to the inode table.  The repeated 4k writes to a single block
will result in undesirable stress on flash devices and SMR disk
drives.  Even on conventional HDD's, the repeated writes to the inode
table block will trigger Adjacent Track Interference (ATI) remediation
latencies, which very negatively impact long tail latencies --- which
is a very big deal for web serving tiers (for example).

Google-Bug-Id: 18297052
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0ae45f63

30 1月, 2015 1 次提交

ext4: Use generic helpers for quotaon and quotaoff · 1fa5efe3

由 Jan Kara 提交于 10月 08, 2014

Ext4 can just use the generic helpers provided by quota code for turning
quotas on and off when quota files are stored as system inodes. The only
difference is the feature test in ext4_quota_on_sysfile() but the same
is achieved in dquot_quota_enable() by checking whether usage tracking
for the corresponding quota type is enabled (which can happen only if
quota feature is set).
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

1fa5efe3

27 1月, 2015 1 次提交
- J
  ext4: change to use setup_timer() instead of init_timer() · 04ecddb7
  由 Jan Mrazek 提交于 1月 26, 2015
```
Signed-off-by: NJan Mrazek <email@honzamrazek.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
  04ecddb7
21 1月, 2015 1 次提交

fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info · de1414a6

由 Christoph Hellwig 提交于 1月 14, 2015

Now that we got rid of the bdi abuse on character devices we can always use
sb->s_bdi to get at the backing_dev_info for a file, except for the block
device special case.  Export inode_to_bdi and replace uses of
mapping->backing_dev_info with it to prepare for the removal of
mapping->backing_dev_info.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

de1414a6

20 1月, 2015 1 次提交
- T
  ext4: reserve codepoints used by the ext4 encryption feature · 3edc18d8
  由 Theodore Ts'o 提交于 1月 19, 2015
```
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
  3edc18d8
03 1月, 2015 2 次提交

J
ext4: remove spurious KERN_INFO from ext4_warning call · 363307e6
由 Jakub Wilk 提交于 1月 02, 2015
```
Signed-off-by: NJakub Wilk <jwilk@jwilk.net>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
363307e6

Revert "ext4: fix suboptimal seek_{data,hole} extents traversial" · ad7fefb1

由 Theodore Ts'o 提交于 1月 02, 2015

This reverts commit 14516bb7.

This was causing regression test failures with generic/285 with an ext3
filesystem using CONFIG_EXT4_USE_FOR_EXT23.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ad7fefb1

27 12月, 2014 1 次提交

ext4: prevent online resize with backup superblock · 011fa994

由 Theodore Ts'o 提交于 12月 26, 2014

Prevent BUG or corrupted file systems after the following:

mkfs.ext4 /dev/vdc 100M
mount -t ext4 -o sb=40961 /dev/vdc /vdc
resize2fs /dev/vdc

We previously prevented online resizing using the old resize ioctl.
Move the code to ext4_resize_begin(), so the check applies for all of
the resize ioctl's.
Reported-by: NMaxim Malkov <malkov@ispras.ru>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

011fa994

17 12月, 2014 1 次提交

move_extent_per_page(): get rid of unused w_flags · b1bc6d7f

由 Al Viro 提交于 12月 17, 2014

... and comparing get_fs() with KERNEL_DS used only to initialize that
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b1bc6d7f

06 12月, 2014 1 次提交

ext4: ext4_da_convert_inline_data_to_extent drop locked page after error · 50db71ab

由 Dmitry Monakhov 提交于 12月 05, 2014

Testcase:
xfstests generic/270
MKFS_OPTIONS="-q -I 256 -O inline_data,64bit"

Call Trace:
 [<ffffffff81144c76>] lock_page+0x35/0x39 -------> DEADLOCK
 [<ffffffff81145260>] pagecache_get_page+0x65/0x15a
 [<ffffffff811507fc>] truncate_inode_pages_range+0x1db/0x45c
 [<ffffffff8120ea63>] ? ext4_da_get_block_prep+0x439/0x4b6
 [<ffffffff811b29b7>] ? __block_write_begin+0x284/0x29c
 [<ffffffff8120e62a>] ? ext4_change_inode_journal_flag+0x16b/0x16b
 [<ffffffff81150af0>] truncate_inode_pages+0x12/0x14
 [<ffffffff81247cb4>] ext4_truncate_failed_write+0x19/0x25
 [<ffffffff812488cf>] ext4_da_write_inline_data_begin+0x196/0x31c
 [<ffffffff81210dad>] ext4_da_write_begin+0x189/0x302
 [<ffffffff810c07ac>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff810ddd13>] ? read_seqcount_begin.clone.1+0x9f/0xcc
 [<ffffffff8114309d>] generic_perform_write+0xc7/0x1c6
 [<ffffffff810c040e>] ? mark_held_locks+0x59/0x77
 [<ffffffff811445d1>] __generic_file_write_iter+0x17f/0x1c5
 [<ffffffff8120726b>] ext4_file_write_iter+0x2a5/0x354
 [<ffffffff81185656>] ? file_start_write+0x2a/0x2c
 [<ffffffff8107bcdb>] ? bad_area_nosemaphore+0x13/0x15
 [<ffffffff811858ce>] new_sync_write+0x8a/0xb2
 [<ffffffff81186e7b>] vfs_write+0xb5/0x14d
 [<ffffffff81186ffb>] SyS_write+0x5c/0x8c
 [<ffffffff816f2529>] system_call_fastpath+0x12/0x17
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

50db71ab

03 12月, 2014 3 次提交

ext4: fix suboptimal seek_{data,hole} extents traversial · 14516bb7

由 Dmitry Monakhov 提交于 12月 02, 2014

It is ridiculous practice to scan inode block by block, this technique
applicable only for old indirect files. This takes significant amount
of time for really large files. Let's reuse ext4_fiemap which already
traverse inode-tree in most optimal meaner.

TESTCASE:
ftruncate64(fd, 0);
ftruncate64(fd, 1ULL << 40);
/* lseek will spin very long time */
lseek64(fd, 0, SEEK_DATA);
lseek64(fd, 0, SEEK_HOLE);

Original report: https://lkml.org/lkml/2014/10/16/620Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

14516bb7

ext4: ext4_inline_data_fiemap should respect callers argument · d952d69e

由 Dmitry Monakhov 提交于 12月 02, 2014

Currently ext4_inline_data_fiemap ignores requested arguments (start
and len) which may lead endless loop if start != 0.  Also fix incorrect
extent length determination.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d952d69e

ext4: prevent fsreentrance deadlock for inline_data · 5cc28a9e

由 Dmitry Monakhov 提交于 12月 02, 2014

ext4_da_convert_inline_data_to_extent() invokes
grab_cache_page_write_begin().  grab_cache_page_write_begin performs
memory allocation, so fs-reentrance should be prohibited because we
are inside journal transaction.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

5cc28a9e

26 11月, 2014 15 次提交

ext4: forbid journal_async_commit in data=ordered mode · d4f76107

由 Jan Kara 提交于 11月 25, 2014

Option journal_async_commit breaks gurantees of data=ordered mode as it
sends only a single cache flush after writing a transaction commit
block. Thus even though the transaction including the commit block is
fully stored on persistent storage, file data may still linger in drives
caches and will be lost on power failure. Since all checksums match on
journal recovery, we replay the transaction thus possibly exposing stale
user data.

To fix this data exposure issue, remove the possibility to use
journal_async_commit in data=ordered mode.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d4f76107

ext4: Remove an unnecessary check for NULL before iput() · bfcba2d0

由 Markus Elfring 提交于 11月 25, 2014

The iput() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

bfcba2d0

ext4: remove unneeded code in ext4_unlink · 31fc006b

由 Namjae Jeon 提交于 11月 25, 2014

Setting retval to zero is not needed in ext4_unlink.
Remove unneeded code.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

31fc006b

ext4: don't count external journal blocks as overhead · b003b524

由 Eric Sandeen 提交于 11月 25, 2014

This was fixed for ext3 with:

e6d8fb34 ext3: Count internal journal as bsddf overhead in ext3_statfs

but was never fixed for ext4.

With a large external journal and no used disk blocks, df comes
out negative without this, as journal blocks are added to the
overhead & subtracted from used blocks unconditionally.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b003b524

ext4: remove never taken branch from ext4_ext_shift_path_extents() · 733ded2a

由 Jan Kara 提交于 11月 25, 2014

path[depth].p_hdr can never be NULL for a path passed to us (and even if
it could, EXT_LAST_EXTENT() would make something != NULL from it). So
just remove the branch.

Coverity-id: 1196498
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

733ded2a

ext4: create nojournal_checksum mount option · c6d3d56d

由 Darrick J. Wong 提交于 11月 25, 2014

Create a mount option to disable journal checksumming (because the
metadata_csum feature turns it on by default now), and fix remount not
to allow changing the journal checksumming option, since changing the
mount options has no effect on the journal.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c6d3d56d

ext4: update comments regarding ext4_delete_inode() · 58d86a50

由 Wang Shilong 提交于 11月 25, 2014

ext4_delete_inode() has been renamed for a long time, update
comments for this.
Signed-off-by: NWang Shilong <wshilong@ddn.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

58d86a50

ext4: cleanup GFP flags inside resize path · 4fdb5543

由 Dmitry Monakhov 提交于 11月 25, 2014

We must use GFP_NOFS instead GFP_KERNEL inside ext4_mb_add_groupinfo
and ext4_calculate_overhead() because they are called from inside a
journal transaction. Call trace:

ioctl
 ->ext4_group_add
   ->journal_start
   ->ext4_setup_new_descs
     ->ext4_mb_add_groupinfo -> GFP_KERNEL
   ->ext4_flex_group_add
     ->ext4_update_super
       ->ext4_calculate_overhead  -> GFP_KERNEL
   ->journal_stop
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4fdb5543

ext4: introduce aging to extent status tree · 2be12de9

由 Jan Kara 提交于 11月 25, 2014

Introduce a simple aging to extent status tree. Each extent has a
REFERENCED bit which gets set when the extent is used. Shrinker then
skips entries with referenced bit set and clears the bit. Thus
frequently used extents have higher chances of staying in memory.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2be12de9

ext4: cleanup flag definitions for extent status tree · 624d0f1d

由 Jan Kara 提交于 11月 25, 2014

Currently flags for extent status tree are defined twice, once shifted
and once without a being shifted. Consolidate these definitions into one
place and make some computations automatic to make adding flags less
error prone. Compiler should be clever enough to figure out these are
constants and generate the same code.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

624d0f1d

ext4: limit number of scanned extents in status tree shrinker · dd475925

由 Jan Kara 提交于 11月 25, 2014

Currently we scan extent status trees of inodes until we reclaim nr_to_scan
extents. This can however require a lot of scanning when there are lots
of delayed extents (as those cannot be reclaimed).

Change shrinker to work as shrinkers are supposed to and *scan* only
nr_to_scan extents regardless of how many extents did we actually
reclaim. We however need to be careful and avoid scanning each status
tree from the beginning - that could lead to a situation where we would
not be able to reclaim anything at all when first nr_to_scan extents in
the tree are always unreclaimable. We remember with each inode offset
where we stopped scanning and continue from there when we next come
across the inode.

Note that we also need to update places calling __es_shrink() manually
to pass reasonable nr_to_scan to have a chance of reclaiming anything and
not just 1.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

dd475925

ext4: move handling of list of shrinkable inodes into extent status code · b0dea4c1

由 Jan Kara 提交于 11月 25, 2014

Currently callers adding extents to extent status tree were responsible
for adding the inode to the list of inodes with freeable extents. This
is error prone and puts list handling in unnecessarily many places.

Just add inode to the list automatically when the first non-delay extent
is added to the tree and remove inode from the list when the last
non-delay extent is removed.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b0dea4c1

ext4: change LRU to round-robin in extent status tree shrinker · edaa53ca

由 Zheng Liu 提交于 11月 25, 2014

In this commit we discard the lru algorithm for inodes with extent
status tree because it takes significant effort to maintain a lru list
in extent status tree shrinker and the shrinker can take a long time to
scan this lru list in order to reclaim some objects.

We replace the lru ordering with a simple round-robin.  After that we
never need to keep a lru list.  That means that the list needn't be
sorted if the shrinker can not reclaim any objects in the first round.

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

edaa53ca

ext4: cache extent hole in extent status tree for ext4_da_map_blocks() · 2f8e0a7c

由 Zheng Liu 提交于 11月 25, 2014

Currently extent status tree doesn't cache extent hole when a write
looks up in extent tree to make sure whether a block has been allocated
or not.  In this case, we don't put extent hole in extent cache because
later this extent might be removed and a new delayed extent might be
added back.  But it will cause a defect when we do a lot of writes.  If
we don't put extent hole in extent cache, the following writes also need
to access extent tree to look at whether or not a block has been
allocated.  It brings a cache miss.  This commit fixes this defect.
Also if the inode doesn't have any extent, this extent hole will be
cached as well.

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2f8e0a7c

ext4: fix block reservation for bigalloc filesystems · cbd7584e

由 Jan Kara 提交于 11月 25, 2014

For bigalloc filesystems we have to check whether newly requested inode
block isn't already part of a cluster for which we already have delayed
allocation reservation. This check happens in ext4_ext_map_blocks() and
that function sets EXT4_MAP_FROM_CLUSTER if that's the case. However if
ext4_da_map_blocks() finds in extent cache information about the block,
we don't call into ext4_ext_map_blocks() and thus we always end up
getting new reservation even if the space for cluster is already
reserved. This results in overreservation and premature ENOSPC reports.

Fix the problem by checking for existing cluster reservation already in
ext4_da_map_blocks(). That simplifies the logic and actually allows us
to get rid of the EXT4_MAP_FROM_CLUSTER flag completely.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

cbd7584e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功