提交 · 4bdfc873ba34e425d6532581b4127b960274272a · openeuler / raspberrypi-kernel

12 4月, 2015 5 次提交

ext4 crypto: insert encrypted filenames into a leaf directory block · 4bdfc873

由 Michael Halcrow 提交于 4月 12, 2015

Signed-off-by: NUday Savagaonkar <savagaon@google.com>
Signed-off-by: NIldar Muslukhov <ildarm@google.com>
Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4bdfc873

ext4 crypto: teach ext4_htree_store_dirent() to store decrypted filenames · 2f61830a

由 Theodore Ts'o 提交于 4月 12, 2015

For encrypted directories, we need to pass in a separate parameter for
the decrypted filename, since the directory entry contains the
encrypted filename.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2f61830a

ext4 crypto: filename encryption facilities · d5d0e8c7

由 Michael Halcrow 提交于 4月 12, 2015

Signed-off-by: NUday Savagaonkar <savagaon@google.com>
Signed-off-by: NIldar Muslukhov <ildarm@google.com>
Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d5d0e8c7

ext4 crypto: add encryption key management facilities · 88bd6ccd

由 Michael Halcrow 提交于 4月 12, 2015

Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NIldar Muslukhov <muslukhovi@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

88bd6ccd

ext4 crypto: add ext4 encryption facilities · b30ab0e0

由 Michael Halcrow 提交于 4月 12, 2015

On encrypt, we will re-assign the buffer_heads to point to a bounce
page rather than the control_page (which is the original page to write
that contains the plaintext). The block I/O occurs against the bounce
page.  On write completion, we re-assign the buffer_heads to the
original plaintext page.

On decrypt, we will attach a read completion callback to the bio
struct. This read completion will decrypt the read contents in-place
prior to setting the page up-to-date.

The current encryption mode, AES-256-XTS, lacks cryptographic
integrity. AES-256-GCM is in-plan, but we will need to devise a
mechanism for handling the integrity data.
Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NIldar Muslukhov <ildarm@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b30ab0e0

11 4月, 2015 3 次提交

ext4 crypto: add encryption policy and password salt support · 9bd8212f

由 Michael Halcrow 提交于 4月 11, 2015

Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NIldar Muslukhov <muslukhovi@gmail.com>

9bd8212f

ext4 crypto: export ext4_empty_dir() · e875a2dd

由 Michael Halcrow 提交于 4月 11, 2015

Required for future encryption xattr changes.
Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

e875a2dd

T
ext4 crypto: reserve codepoints used by the ext4 encryption feature · f542fbe8
由 Theodore Ts'o 提交于 4月 11, 2015
```
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
f542fbe8

08 4月, 2015 1 次提交

ext4 crypto: add ext4_mpage_readpages() · f64e02fe

由 Theodore Ts'o 提交于 4月 08, 2015

This takes code from fs/mpage.c and optimizes it for ext4.  Its
primary reason is to allow us to more easily add encryption to ext4's
read path in an efficient manner.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

f64e02fe

17 2月, 2015 1 次提交

ext4: add DAX functionality · 923ae0ff

由 Ross Zwisler 提交于 2月 16, 2015

This is a port of the DAX functionality found in the current version of
ext2.

[matthew.r.wilcox@intel.com: heavily tweaked]
[akpm@linux-foundation.org: remap_pages went away]
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NAndreas Dilger <andreas.dilger@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

923ae0ff

13 2月, 2015 1 次提交

ext4: support read-only images · 2cb5cc8b

由 Darrick J. Wong 提交于 2月 12, 2015

Add a rocompat feature, "readonly" to mark a FS image as read-only.
The feature prevents the kernel and e2fsprogs from changing the image;
the flag can be toggled by tune2fs.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2cb5cc8b

20 1月, 2015 1 次提交
- T
  ext4: reserve codepoints used by the ext4 encryption feature · 3edc18d8
  由 Theodore Ts'o 提交于 1月 19, 2015
```
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
  3edc18d8
03 12月, 2014 1 次提交

ext4: ext4_inline_data_fiemap should respect callers argument · d952d69e

由 Dmitry Monakhov 提交于 12月 02, 2014

Currently ext4_inline_data_fiemap ignores requested arguments (start
and len) which may lead endless loop if start != 0.  Also fix incorrect
extent length determination.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d952d69e

26 11月, 2014 4 次提交

ext4: limit number of scanned extents in status tree shrinker · dd475925

由 Jan Kara 提交于 11月 25, 2014

Currently we scan extent status trees of inodes until we reclaim nr_to_scan
extents. This can however require a lot of scanning when there are lots
of delayed extents (as those cannot be reclaimed).

Change shrinker to work as shrinkers are supposed to and *scan* only
nr_to_scan extents regardless of how many extents did we actually
reclaim. We however need to be careful and avoid scanning each status
tree from the beginning - that could lead to a situation where we would
not be able to reclaim anything at all when first nr_to_scan extents in
the tree are always unreclaimable. We remember with each inode offset
where we stopped scanning and continue from there when we next come
across the inode.

Note that we also need to update places calling __es_shrink() manually
to pass reasonable nr_to_scan to have a chance of reclaiming anything and
not just 1.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

dd475925

ext4: change LRU to round-robin in extent status tree shrinker · edaa53ca

由 Zheng Liu 提交于 11月 25, 2014

In this commit we discard the lru algorithm for inodes with extent
status tree because it takes significant effort to maintain a lru list
in extent status tree shrinker and the shrinker can take a long time to
scan this lru list in order to reclaim some objects.

We replace the lru ordering with a simple round-robin.  After that we
never need to keep a lru list.  That means that the list needn't be
sorted if the shrinker can not reclaim any objects in the first round.

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

edaa53ca

ext4: cache extent hole in extent status tree for ext4_da_map_blocks() · 2f8e0a7c

由 Zheng Liu 提交于 11月 25, 2014

Currently extent status tree doesn't cache extent hole when a write
looks up in extent tree to make sure whether a block has been allocated
or not.  In this case, we don't put extent hole in extent cache because
later this extent might be removed and a new delayed extent might be
added back.  But it will cause a defect when we do a lot of writes.  If
we don't put extent hole in extent cache, the following writes also need
to access extent tree to look at whether or not a block has been
allocated.  It brings a cache miss.  This commit fixes this defect.
Also if the inode doesn't have any extent, this extent hole will be
cached as well.

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2f8e0a7c

ext4: fix block reservation for bigalloc filesystems · cbd7584e

由 Jan Kara 提交于 11月 25, 2014

For bigalloc filesystems we have to check whether newly requested inode
block isn't already part of a cluster for which we already have delayed
allocation reservation. This check happens in ext4_ext_map_blocks() and
that function sets EXT4_MAP_FROM_CLUSTER if that's the case. However if
ext4_da_map_blocks() finds in extent cache information about the block,
we don't call into ext4_ext_map_blocks() and thus we always end up
getting new reservation even if the space for cluster is already
reserved. This results in overreservation and premature ENOSPC reports.

Fix the problem by checking for existing cluster reservation already in
ext4_da_map_blocks(). That simplifies the logic and actually allows us
to get rid of the EXT4_MAP_FROM_CLUSTER flag completely.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

cbd7584e

21 11月, 2014 1 次提交

ext4: kill ext4_kvfree() · b93b41d4

由 Al Viro 提交于 11月 20, 2014

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b93b41d4

10 11月, 2014 1 次提交

ext4: Convert to private i_dquot field · 96c7e0d9

由 Jan Kara 提交于 9月 29, 2014

CC: linux-ext4@vger.kernel.org
Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

96c7e0d9

14 10月, 2014 1 次提交

ext4: check s_chksum_driver when looking for bg csum presence · 813d32f9

由 Darrick J. Wong 提交于 10月 14, 2014

Convert the ext4_has_group_desc_csum predicate to look for a checksum
driver instead of the metadata_csum flag and change the bg checksum
calculation function to look for GDT_CSUM before taking the crc16
path.

Without this patch, if we mount with ^uninit_bg,^metadata_csum and
later metadata_csum gets turned on by accident, the block group
checksum functions will incorrectly assume that checksumming is
enabled (metadata_csum) but that crc16 should be used
(!s_chksum_driver).  This is totally wrong, so fix the predicate
and the checksum formula selection.

(Granted, if the metadata_csum feature bit gets enabled on a live FS
then something underhanded is going on, but we could at least avoid
writing garbage into the on-disk fields.)
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NDmitry Monakhov <dmonakhov@openvz.org>
Cc: stable@vger.kernel.org

813d32f9

13 10月, 2014 1 次提交

ext4: Replace open coded mdata csum feature to helper function · 9aa5d32b

由 Dmitry Monakhov 提交于 10月 13, 2014

Besides the fact that this replacement improves code readability
it also protects from errors caused direct EXT4_S(sb)->s_es manipulation
which may result attempt to use uninitialized  csum machinery.

#Testcase_BEGIN
IMG=/dev/ram0
MNT=/mnt
mkfs.ext4 $IMG
mount $IMG $MNT
#Enable feature directly on disk, on mounted fs
tune2fs -O metadata_csum  $IMG
# Provoke metadata update, likey result in OOPS
touch $MNT/test
umount $MNT
#Testcase_END

# Replacement script
@@
expression E;
@@
- EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
+ ext4_has_metadata_csum(E)

https://bugzilla.kernel.org/show_bug.cgi?id=82201Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

9aa5d32b

06 10月, 2014 1 次提交

ext4: add ext4_iget_normal() which is to be used for dir tree lookups · f4bb2981

由 Theodore Ts'o 提交于 10月 05, 2014

If there is a corrupted file system which has directory entries that
point at reserved, metadata inodes, prohibit them from being used by
treating them the same way we treat Boot Loader inodes --- that is,
mark them to be bad inodes.  This prohibits them from being opened,
deleted, or modified via chmod, chown, utimes, etc.

In particular, this prevents a corrupted file system which has a
directory entry which points at the journal inode from being deleted
and its blocks released, after which point Much Hilarity Ensues.
Reported-by: NSami Liedes <sami.liedes@iki.fi>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

f4bb2981

11 9月, 2014 1 次提交

ext4: don't use MAXQUOTAS value · a2d4a646

由 Jan Kara 提交于 9月 11, 2014

MAXQUOTAS value defines maximum number of quota types VFS supports.
This isn't necessarily the number of types ext4 supports. Although
ext4 will support project quotas, use ext4 private definition for
consistency with other filesystems.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

a2d4a646

05 9月, 2014 2 次提交

T
ext4: renumber EXT4_EX_* flags to avoid flag aliasing problems · d26e2c4d
由 Theodore Ts'o 提交于 9月 04, 2014
```
Suggested-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
d26e2c4d

ext4: drop the EXT4_STATE_DELALLOC_RESERVED flag · 754cfed6

由 Theodore Ts'o 提交于 9月 04, 2014

Having done a full regression test, we can now drop the
DELALLOC_RESERVED state flag.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

754cfed6

02 9月, 2014 5 次提交

ext4: track extent status tree shrinker delay statictics · eb68d0e2

由 Zheng Liu 提交于 9月 01, 2014

This commit adds some statictics in extent status tree shrinker.  The
purpose to add these is that we want to collect more details when we
encounter a stall caused by extent status tree shrinker.  Here we count
the following statictics:
  stats:
    the number of all objects on all extent status trees
    the number of reclaimable objects on lru list
    cache hits/misses
    the last sorted interval
    the number of inodes on lru list
  average:
    scan time for shrinking some objects
    the number of shrunk objects
  maximum:
    the inode that has max nr. of objects on lru list
    the maximum scan time for shrinking some objects

The output looks like below:
  $ cat /proc/fs/ext4/sda1/es_shrinker_info
  stats:
    28228 objects
    6341 reclaimable objects
    5281/631 cache hits/misses
    586 ms last sorted interval
    250 inodes on lru list
  average:
    153 us scan time
    128 shrunk objects
  maximum:
    255 inode (255 objects, 198 reclaimable)
    125723 us max scan time

If the lru list has never been sorted, the following line will not be
printed:
    586ms last sorted interval
If there is an empty lru list, the following lines also will not be
printed:
    250 inodes on lru list
  ...
  maximum:
    255 inode (255 objects, 198 reclaimable)
    0 us max scan time

Meanwhile in this commit a new trace point is defined to print some
details in __ext4_es_shrink().

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

eb68d0e2

T
ext4: rename ext4_ext_find_extent() to ext4_find_extent() · ed8a1a76
由 Theodore Ts'o 提交于 9月 01, 2014
```
Make the function name less redundant.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
ed8a1a76

ext4: drop EXT4_EX_NOFREE_ON_ERR from rest of extents handling code · dfe50809

由 Theodore Ts'o 提交于 9月 01, 2014

Drop EXT4_EX_NOFREE_ON_ERR from ext4_ext_create_new_leaf(),
ext4_split_extent(), ext4_convert_unwritten_extents_endio().

This requires fixing all of their callers to potentially
ext4_ext_find_extent() to free the struct ext4_ext_path object in case
of an error, and there are interlocking dependencies all the way up to
ext4_ext_map_blocks(), ext4_swap_extents(), and
ext4_ext_remove_space().

Once this is done, we can drop the EXT4_EX_NOFREE_ON_ERR flag since it
is no longer necessary.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

dfe50809

ext4: teach ext4_ext_find_extent() to free path on error · 705912ca

由 Theodore Ts'o 提交于 9月 01, 2014

Right now, there are a places where it is all to easy to leak memory
on an error path, via a usage like this:

	struct ext4_ext_path *path = NULL

	while (...) {
		...
		path = ext4_ext_find_extent(inode, block, path, 0);
		if (IS_ERR(path)) {
			/* oops, if path was non-NULL before the call to
			   ext4_ext_find_extent, we've leaked it!  :-(  */
			...
			return PTR_ERR(path);
		}
		...
	}

Unfortunately, there some code paths where we are doing the following
instead:

	path = ext4_ext_find_extent(inode, block, orig_path, 0);

and where it's important that we _not_ free orig_path in the case
where ext4_ext_find_extent() returns an error.

So change the function signature of ext4_ext_find_extent() so that it
takes a struct ext4_ext_path ** for its third argument, and by
default, on an error, it will free the struct ext4_ext_path, and then
zero out the struct ext4_ext_path * pointer.  In order to avoid
causing problems, we add a flag EXT4_EX_NOFREE_ON_ERR which causes
ext4_ext_find_extent() to use the original behavior of forcing the
caller to deal with freeing the original path pointer on the error
case.

The goal is to get rid of EXT4_EX_NOFREE_ON_ERR entirely, but this
allows for a gentle transition and makes the patches easier to verify.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

705912ca

ext4: fix accidental flag aliasing in ext4_map_blocks flags · bd30d702

由 Theodore Ts'o 提交于 9月 01, 2014

Commit b8a86845 introduced an accidental flag aliasing between
EXT4_EX_NOCACHE and EXT4_GET_BLOCKS_CONVERT_UNWRITTEN.

Fortunately, this didn't introduce any untorward side effects --- we
got lucky.  Nevertheless, fix this and leave a warning to hopefully
avoid this from happening in the future.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

bd30d702

31 8月, 2014 2 次提交

ext4: refactor ext4_move_extents code base · fcf6b1b7

由 Dmitry Monakhov 提交于 8月 30, 2014

ext4_move_extents is too complex for review. It has duplicate almost
each function available in the rest of other codebase. It has useless
artificial restriction orig_offset == donor_offset. But in fact logic
of ext4_move_extents is very simple:

Iterate extents one by one (similar to ext4_fill_fiemap_extents)
   ->Iterate each page covered extent (similar to generic_perform_write)
     ->swap extents for covered by page (can be shared with IOC_MOVE_DATA)
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

fcf6b1b7

ext4: use ext4_ext_next_allocated_block instead of mext_next_extent · f8fb4f41

由 Dmitry Monakhov 提交于 8月 30, 2014

This allows us to make mext_next_extent static and potentially get rid
of it.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

f8fb4f41

30 8月, 2014 2 次提交
- T
  ext4: convert ext4_bread() to use the ERR_PTR convention · 1c215028
  由 Theodore Ts'o 提交于 8月 29, 2014
```
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
  1c215028
- T
  ext4: convert ext4_getblk() to use the ERR_PTR convention · 10560082
  由 Theodore Ts'o 提交于 8月 29, 2014
```
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
  10560082
24 8月, 2014 2 次提交

ext4: move i_size,i_disksize update routines to helper function · 4631dbf6

由 Dmitry Monakhov 提交于 8月 23, 2014

Cc: stable@vger.kernel.org # needed for bug fix patches
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4631dbf6

ext4: propagate errors up to ext4_find_entry()'s callers · 36de9286

由 Theodore Ts'o 提交于 8月 23, 2014

If we run into some kind of error, such as ENOMEM, while calling
ext4_getblk() or ext4_dx_find_entry(), we need to make sure this error
gets propagated up to ext4_find_entry() and then to its callers.  This
way, transient errors such as ENOMEM can get propagated to the VFS.
This is important so that the system calls return the appropriate
error, and also so that in the case of ext4_lookup(), we return an
error instead of a NULL inode, since that will result in a negative
dentry cache entry that will stick around long past the OOM condition
which caused a transient ENOMEM error.

Google-Bug-Id: #17142205
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

36de9286

29 7月, 2014 1 次提交

ext4: check inline directory before converting · 40b163f1

由 Darrick J. Wong 提交于 7月 28, 2014

Before converting an inline directory to a regular directory, check
the directory entries to make sure they're not obviously broken.
This helps us to avoid a BUG_ON if one of the dirents is trashed.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>

40b163f1

15 7月, 2014 3 次提交

ext4: make ext4_has_inline_data() as a inline function · 83447ccb

由 Zheng Liu 提交于 7月 15, 2014

Now ext4_has_inline_data() is used in wide spread codepaths.  So we need
to make it as a inline function to avoid burning some CPU cycles.

Change in text size:

         text     data      bss     dec     hex filename
before: 326110    19258    5528  350896   55ab0 fs/ext4/ext4.o
after:  326227    19258    5528  351013   55b25 fs/ext4/ext4.o

I use the following script to measure the CPU usage.

  #!/bin/bash

  shm_base='/dev/shm'
  img=${shm_base}/ext4-img
  mnt=/mnt/loop

  e2fsprgs_base=$HOME/e2fsprogs
  mkfs=${e2fsprgs_base}/misc/mke2fs
  fsck=${e2fsprgs_base}/e2fsck/e2fsck

  sudo umount $mnt
  dd if=/dev/zero of=$img bs=4k count=3145728
  ${mkfs} -t ext4 -O inline_data -F $img
  sudo mount -t ext4 -o loop $img $mnt

  # start testing...
  testdir="${mnt}/testdir"
  mkdir $testdir
  cd $testdir

  echo "start testing..."
  for ((cnt=0;cnt<100;cnt++)); do

  for ((i=0;i<5;i++)); do
  	for ((j=0;j<5;j++)); do
  		for ((k=0;k<5;k++)); do
  			for ((l=0;l<5;l++)); do
  				mkdir -p $i/$j/$k/$l
  				echo "$i-$j-$k-$l" > $i/$j/$k/$l/testfile
  			done
  		done
  	done
  done

  ls -R $testdir > /dev/null
  rm -rf $testdir/*

  done

The result of `perf top -G -U` is as below.

vanilla:
 13.92%  [ext4]  [k] ext4_do_update_inode
  9.36%  [ext4]  [k] __ext4_get_inode_loc
  4.07%  [ext4]  [k] ftrace_define_fields_ext4_writepages
  3.83%  [ext4]  [k] __ext4_handle_dirty_metadata
  3.42%  [ext4]  [k] ext4_get_inode_flags
  2.71%  [ext4]  [k] ext4_mark_iloc_dirty
  2.46%  [ext4]  [k] ftrace_define_fields_ext4_direct_IO_enter
  2.26%  [ext4]  [k] ext4_get_inode_loc
  2.22%  [ext4]  [k] ext4_has_inline_data
  [...]

After applied the patch, we don't see ext4_has_inline_data() because it
has been inlined and perf couldn't sample it.  Although it doesn't mean
that the CPU cycles can be saved but at least the overhead of function
calls can be eliminated.  So IMHO we'd better inline this function.

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

83447ccb

ext4: fix punch hole on files with indirect mapping · 4f579ae7

由 Lukas Czerner 提交于 7月 15, 2014

Currently punch hole code on files with direct/indirect mapping has some
problems which may lead to a data loss. For example (from Jan Kara):

fallocate -n -p 10240000 4096

will punch the range 10240000 - 12632064 instead of the range 1024000 -
10244096.

Also the code is a bit weird and it's not using infrastructure provided
by indirect.c, but rather creating it's own way.

This patch fixes the issues as well as making the operation to run 4
times faster from my testing (punching out 60GB file). It uses similar
approach used in ext4_ind_truncate() which takes advantage of
ext4_free_branches() function.

Also rename the ext4_free_hole_blocks() to something more sensible, like
the equivalent we have for extent mapped files. Call it
ext4_ind_remove_space().

This has been tested mostly with fsx and some xfstests which are testing
punch hole but does not require unwritten extents which are not
supported with direct/indirect mapping. Not problems showed up even with
1024k block size.

CC: stable@vger.kernel.org
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4f579ae7

ext4: remove metadata reservation checks · 71d4f7d0

由 Theodore Ts'o 提交于 7月 15, 2014

Commit 27dd4385 ("ext4: introduce reserved space") reserves 2% of
the file system space to make sure metadata allocations will always
succeed.  Given that, tracking the reservation of metadata blocks is
no longer necessary.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

71d4f7d0