提交 · a49058fab2912296f068759490ac69ba43b43861 · openeuler / raspberrypi-kernel

05 9月, 2014 2 次提交

T
ext4: renumber EXT4_EX_* flags to avoid flag aliasing problems · d26e2c4d
由 Theodore Ts'o 提交于 9月 04, 2014
```
Suggested-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
d26e2c4d

ext4: drop the EXT4_STATE_DELALLOC_RESERVED flag · 754cfed6

由 Theodore Ts'o 提交于 9月 04, 2014

Having done a full regression test, we can now drop the
DELALLOC_RESERVED state flag.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

754cfed6

02 9月, 2014 5 次提交

ext4: track extent status tree shrinker delay statictics · eb68d0e2

由 Zheng Liu 提交于 9月 01, 2014

This commit adds some statictics in extent status tree shrinker.  The
purpose to add these is that we want to collect more details when we
encounter a stall caused by extent status tree shrinker.  Here we count
the following statictics:
  stats:
    the number of all objects on all extent status trees
    the number of reclaimable objects on lru list
    cache hits/misses
    the last sorted interval
    the number of inodes on lru list
  average:
    scan time for shrinking some objects
    the number of shrunk objects
  maximum:
    the inode that has max nr. of objects on lru list
    the maximum scan time for shrinking some objects

The output looks like below:
  $ cat /proc/fs/ext4/sda1/es_shrinker_info
  stats:
    28228 objects
    6341 reclaimable objects
    5281/631 cache hits/misses
    586 ms last sorted interval
    250 inodes on lru list
  average:
    153 us scan time
    128 shrunk objects
  maximum:
    255 inode (255 objects, 198 reclaimable)
    125723 us max scan time

If the lru list has never been sorted, the following line will not be
printed:
    586ms last sorted interval
If there is an empty lru list, the following lines also will not be
printed:
    250 inodes on lru list
  ...
  maximum:
    255 inode (255 objects, 198 reclaimable)
    0 us max scan time

Meanwhile in this commit a new trace point is defined to print some
details in __ext4_es_shrink().

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

eb68d0e2

T
ext4: rename ext4_ext_find_extent() to ext4_find_extent() · ed8a1a76
由 Theodore Ts'o 提交于 9月 01, 2014
```
Make the function name less redundant.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
ed8a1a76

ext4: drop EXT4_EX_NOFREE_ON_ERR from rest of extents handling code · dfe50809

由 Theodore Ts'o 提交于 9月 01, 2014

Drop EXT4_EX_NOFREE_ON_ERR from ext4_ext_create_new_leaf(),
ext4_split_extent(), ext4_convert_unwritten_extents_endio().

This requires fixing all of their callers to potentially
ext4_ext_find_extent() to free the struct ext4_ext_path object in case
of an error, and there are interlocking dependencies all the way up to
ext4_ext_map_blocks(), ext4_swap_extents(), and
ext4_ext_remove_space().

Once this is done, we can drop the EXT4_EX_NOFREE_ON_ERR flag since it
is no longer necessary.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

dfe50809

ext4: teach ext4_ext_find_extent() to free path on error · 705912ca

由 Theodore Ts'o 提交于 9月 01, 2014

Right now, there are a places where it is all to easy to leak memory
on an error path, via a usage like this:

	struct ext4_ext_path *path = NULL

	while (...) {
		...
		path = ext4_ext_find_extent(inode, block, path, 0);
		if (IS_ERR(path)) {
			/* oops, if path was non-NULL before the call to
			   ext4_ext_find_extent, we've leaked it!  :-(  */
			...
			return PTR_ERR(path);
		}
		...
	}

Unfortunately, there some code paths where we are doing the following
instead:

	path = ext4_ext_find_extent(inode, block, orig_path, 0);

and where it's important that we _not_ free orig_path in the case
where ext4_ext_find_extent() returns an error.

So change the function signature of ext4_ext_find_extent() so that it
takes a struct ext4_ext_path ** for its third argument, and by
default, on an error, it will free the struct ext4_ext_path, and then
zero out the struct ext4_ext_path * pointer.  In order to avoid
causing problems, we add a flag EXT4_EX_NOFREE_ON_ERR which causes
ext4_ext_find_extent() to use the original behavior of forcing the
caller to deal with freeing the original path pointer on the error
case.

The goal is to get rid of EXT4_EX_NOFREE_ON_ERR entirely, but this
allows for a gentle transition and makes the patches easier to verify.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

705912ca

ext4: fix accidental flag aliasing in ext4_map_blocks flags · bd30d702

由 Theodore Ts'o 提交于 9月 01, 2014

Commit b8a86845 introduced an accidental flag aliasing between
EXT4_EX_NOCACHE and EXT4_GET_BLOCKS_CONVERT_UNWRITTEN.

Fortunately, this didn't introduce any untorward side effects --- we
got lucky.  Nevertheless, fix this and leave a warning to hopefully
avoid this from happening in the future.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

bd30d702

31 8月, 2014 2 次提交

ext4: refactor ext4_move_extents code base · fcf6b1b7

由 Dmitry Monakhov 提交于 8月 30, 2014

ext4_move_extents is too complex for review. It has duplicate almost
each function available in the rest of other codebase. It has useless
artificial restriction orig_offset == donor_offset. But in fact logic
of ext4_move_extents is very simple:

Iterate extents one by one (similar to ext4_fill_fiemap_extents)
   ->Iterate each page covered extent (similar to generic_perform_write)
     ->swap extents for covered by page (can be shared with IOC_MOVE_DATA)
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

fcf6b1b7

ext4: use ext4_ext_next_allocated_block instead of mext_next_extent · f8fb4f41

由 Dmitry Monakhov 提交于 8月 30, 2014

This allows us to make mext_next_extent static and potentially get rid
of it.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

f8fb4f41

30 8月, 2014 2 次提交
- T
  ext4: convert ext4_bread() to use the ERR_PTR convention · 1c215028
  由 Theodore Ts'o 提交于 8月 29, 2014
```
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
  1c215028
- T
  ext4: convert ext4_getblk() to use the ERR_PTR convention · 10560082
  由 Theodore Ts'o 提交于 8月 29, 2014
```
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
  10560082
24 8月, 2014 2 次提交

ext4: move i_size,i_disksize update routines to helper function · 4631dbf6

由 Dmitry Monakhov 提交于 8月 23, 2014

Cc: stable@vger.kernel.org # needed for bug fix patches
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4631dbf6

ext4: propagate errors up to ext4_find_entry()'s callers · 36de9286

由 Theodore Ts'o 提交于 8月 23, 2014

If we run into some kind of error, such as ENOMEM, while calling
ext4_getblk() or ext4_dx_find_entry(), we need to make sure this error
gets propagated up to ext4_find_entry() and then to its callers.  This
way, transient errors such as ENOMEM can get propagated to the VFS.
This is important so that the system calls return the appropriate
error, and also so that in the case of ext4_lookup(), we return an
error instead of a NULL inode, since that will result in a negative
dentry cache entry that will stick around long past the OOM condition
which caused a transient ENOMEM error.

Google-Bug-Id: #17142205
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

36de9286

29 7月, 2014 1 次提交

ext4: check inline directory before converting · 40b163f1

由 Darrick J. Wong 提交于 7月 28, 2014

Before converting an inline directory to a regular directory, check
the directory entries to make sure they're not obviously broken.
This helps us to avoid a BUG_ON if one of the dirents is trashed.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>

40b163f1

15 7月, 2014 3 次提交

ext4: make ext4_has_inline_data() as a inline function · 83447ccb

由 Zheng Liu 提交于 7月 15, 2014

Now ext4_has_inline_data() is used in wide spread codepaths.  So we need
to make it as a inline function to avoid burning some CPU cycles.

Change in text size:

         text     data      bss     dec     hex filename
before: 326110    19258    5528  350896   55ab0 fs/ext4/ext4.o
after:  326227    19258    5528  351013   55b25 fs/ext4/ext4.o

I use the following script to measure the CPU usage.

  #!/bin/bash

  shm_base='/dev/shm'
  img=${shm_base}/ext4-img
  mnt=/mnt/loop

  e2fsprgs_base=$HOME/e2fsprogs
  mkfs=${e2fsprgs_base}/misc/mke2fs
  fsck=${e2fsprgs_base}/e2fsck/e2fsck

  sudo umount $mnt
  dd if=/dev/zero of=$img bs=4k count=3145728
  ${mkfs} -t ext4 -O inline_data -F $img
  sudo mount -t ext4 -o loop $img $mnt

  # start testing...
  testdir="${mnt}/testdir"
  mkdir $testdir
  cd $testdir

  echo "start testing..."
  for ((cnt=0;cnt<100;cnt++)); do

  for ((i=0;i<5;i++)); do
  	for ((j=0;j<5;j++)); do
  		for ((k=0;k<5;k++)); do
  			for ((l=0;l<5;l++)); do
  				mkdir -p $i/$j/$k/$l
  				echo "$i-$j-$k-$l" > $i/$j/$k/$l/testfile
  			done
  		done
  	done
  done

  ls -R $testdir > /dev/null
  rm -rf $testdir/*

  done

The result of `perf top -G -U` is as below.

vanilla:
 13.92%  [ext4]  [k] ext4_do_update_inode
  9.36%  [ext4]  [k] __ext4_get_inode_loc
  4.07%  [ext4]  [k] ftrace_define_fields_ext4_writepages
  3.83%  [ext4]  [k] __ext4_handle_dirty_metadata
  3.42%  [ext4]  [k] ext4_get_inode_flags
  2.71%  [ext4]  [k] ext4_mark_iloc_dirty
  2.46%  [ext4]  [k] ftrace_define_fields_ext4_direct_IO_enter
  2.26%  [ext4]  [k] ext4_get_inode_loc
  2.22%  [ext4]  [k] ext4_has_inline_data
  [...]

After applied the patch, we don't see ext4_has_inline_data() because it
has been inlined and perf couldn't sample it.  Although it doesn't mean
that the CPU cycles can be saved but at least the overhead of function
calls can be eliminated.  So IMHO we'd better inline this function.

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

83447ccb

ext4: fix punch hole on files with indirect mapping · 4f579ae7

由 Lukas Czerner 提交于 7月 15, 2014

Currently punch hole code on files with direct/indirect mapping has some
problems which may lead to a data loss. For example (from Jan Kara):

fallocate -n -p 10240000 4096

will punch the range 10240000 - 12632064 instead of the range 1024000 -
10244096.

Also the code is a bit weird and it's not using infrastructure provided
by indirect.c, but rather creating it's own way.

This patch fixes the issues as well as making the operation to run 4
times faster from my testing (punching out 60GB file). It uses similar
approach used in ext4_ind_truncate() which takes advantage of
ext4_free_branches() function.

Also rename the ext4_free_hole_blocks() to something more sensible, like
the equivalent we have for extent mapped files. Call it
ext4_ind_remove_space().

This has been tested mostly with fsx and some xfstests which are testing
punch hole but does not require unwritten extents which are not
supported with direct/indirect mapping. Not problems showed up even with
1024k block size.

CC: stable@vger.kernel.org
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4f579ae7

ext4: remove metadata reservation checks · 71d4f7d0

由 Theodore Ts'o 提交于 7月 15, 2014

Commit 27dd4385 ("ext4: introduce reserved space") reserves 2% of
the file system space to make sure metadata allocations will always
succeed.  Given that, tracking the reservation of metadata blocks is
no longer necessary.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

71d4f7d0

12 5月, 2014 3 次提交

ext4: make local functions static · c197855e

由 Stephen Hemminger 提交于 5月 12, 2014

I have been running make namespacecheck to look for unneeded globals, and
found these in ext4.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c197855e

ext4: fix block bitmap initialization under sparse_super2 · 1beeef1b

由 Darrick J. Wong 提交于 5月 12, 2014

The ext4_bg_has_super() function doesn't know about the new rules for
where backup superblocks go on a sparse_super2 filesystem.  Therefore,
block bitmap initialization doesn't know that it shouldn't reserve
space for backups in groups that are never going to contain backups.
The result of this is e2fsck complaining about the block bitmap being
incorrect (fortunately not in a way that results in cross-linked
files), so fix the whole thing.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1beeef1b

ext4: fix data integrity sync in ordered mode · 1c8349a1

由 Namjae Jeon 提交于 5月 12, 2014

When we perform a data integrity sync we tag all the dirty pages with
PAGECACHE_TAG_TOWRITE at start of ext4_da_writepages.  Later we check
for this tag in write_cache_pages_da and creates a struct
mpage_da_data containing contiguously indexed pages tagged with this
tag and sync these pages with a call to mpage_da_map_and_submit.  This
process is done in while loop until all the PAGECACHE_TAG_TOWRITE
pages are synced. We also do journal start and stop in each iteration.
journal_stop could initiate journal commit which would call
ext4_writepage which in turn will call ext4_bio_write_page even for
delayed OR unwritten buffers. When ext4_bio_write_page is called for
such buffers, even though it does not sync them but it clears the
PAGECACHE_TAG_TOWRITE of the corresponding page and hence these pages
are also not synced by the currently running data integrity sync. We
will end up with dirty pages although sync is completed.

This could cause a potential data loss when the sync call is followed
by a truncate_pagecache call, which is exactly the case in
collapse_range.  (It will cause generic/127 failure in xfstests)

To avoid this issue, we can use set_page_writeback_keepwrite instead of
set_page_writeback, which doesn't clear TOWRITE tag.

Cc: stable@vger.kernel.org
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

1c8349a1

07 5月, 2014 1 次提交
- A
  ext4: switch the guts of ->direct_IO() to iov_iter · 16b1f05d
  由 Al Viro 提交于 3月 04, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  16b1f05d
22 4月, 2014 1 次提交

ext4: add a new spinlock i_raw_lock to protect the ext4's raw inode · 202ee5df

由 Theodore Ts'o 提交于 4月 21, 2014

To avoid potential data races, use a spinlock which protects the raw
(on-disk) inode.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

202ee5df

21 4月, 2014 2 次提交

ext4: rename uninitialized extents to unwritten · 556615dc

由 Lukas Czerner 提交于 4月 20, 2014

Currently in ext4 there is quite a mess when it comes to naming
unwritten extents. Sometimes we call it uninitialized and sometimes we
refer to it as unwritten.

The right name for the extent which has been allocated but does not
contain any written data is _unwritten_. Other file systems are
using this name consistently, even the buffer head state refers to it as
unwritten. We need to fix this confusion in ext4.

This commit changes every reference to an uninitialized extent (meaning
allocated but unwritten) to unwritten extent. This includes comments,
function names and variable names. It even covers abbreviation of the
word uninitialized (such as uninit) and some misspellings.

This commit does not change any of the code paths at all. This has been
confirmed by comparing md5sums of the assembly code of each object file
after all the function names were stripped from it.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

556615dc

ext4: get rid of EXT4_MAP_UNINIT flag · 090f32ee

由 Lukas Czerner 提交于 4月 20, 2014

Currently EXT4_MAP_UNINIT is used in dioread_nolock case to mark the
cases where we're using dioread_nolock and we're writing into either
unallocated, or unwritten extent, because we need to make sure that
any DIO write into that inode will wait for the extent conversion.

However EXT4_MAP_UNINIT is not only entirely misleading name but also
unnecessary because we can check for EXT4_MAP_UNWRITTEN in the
dioread_nolock case instead.

This commit removes EXT4_MAP_UNINIT flag.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

090f32ee

11 4月, 2014 1 次提交

ext4: move ext4_update_i_disksize() into mpage_map_and_submit_extent() · 622cad13

由 Theodore Ts'o 提交于 4月 11, 2014

The function ext4_update_i_disksize() is used in only one place, in
the function mpage_map_and_submit_extent().  Move its code to simplify
the code paths, and also move the call to ext4_mark_inode_dirty() into
the i_data_sem's critical region, to be consistent with all of the
other places where we update i_disksize.  That way, we also keep the
raw_inode's i_disksize protected, to avoid the following race:

      CPU #1                                 CPU #2

   down_write(&i_data_sem)
   Modify i_disk_size
   up_write(&i_data_sem)
                                        down_write(&i_data_sem)
                                        Modify i_disk_size
                                        Copy i_disk_size to on-disk inode
                                        up_write(&i_data_sem)
   Copy i_disk_size to on-disk inode
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org

622cad13

25 3月, 2014 2 次提交

ext4: make ext4_block_zero_page_range static · 94350ab5

由 Matthew Wilcox 提交于 3月 24, 2014

It's only called within inode.c, so make it static, remove its prototype
from ext4.h and move it above all of its callers so it doesn't need a
prototype within inode.c.
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

94350ab5

ext4: optimize Hurd tests when reading/writing inodes · ed3654eb

由 Theodore Ts'o 提交于 3月 24, 2014

Set a in-memory superblock flag to indicate whether the file system is
designed to support the Hurd.

Also, add a sanity check to make sure the 64-bit feature is not set
for Hurd file systems, since i_file_acl_high conflicts with a
Hurd-specific field.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ed3654eb

19 3月, 2014 2 次提交

ext4: each filesystem creates and uses its own mb_cache · 9c191f70

由 T Makphaibulchoke 提交于 3月 18, 2014

This patch adds new interfaces to create and destory cache,
ext4_xattr_create_cache() and ext4_xattr_destroy_cache(), and remove
the cache creation and destory calls from ex4_init_xattr() and
ext4_exitxattr() in fs/ext4/xattr.c.

fs/ext4/super.c has been changed so that when a filesystem is mounted
a cache is allocated and attched to its ext4_sb_info structure.

fs/mbcache.c has been changed so that only one slab allocator is
allocated and used by all mbcache structures.
Signed-off-by: NT. Makphaibulchoke <tmac@hp.com>

9c191f70

ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate · b8a86845

由 Lukas Czerner 提交于 3月 18, 2014

Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same
functionality as xfs ioctl XFS_IOC_ZERO_RANGE.

It can be used to convert a range of file to zeros preferably without
issuing data IO. Blocks should be preallocated for the regions that span
holes in the file, and the entire range is preferable converted to
unwritten extents

This can be also used to preallocate blocks past EOF in the same way as
with fallocate. Flag FALLOC_FL_KEEP_SIZE which should cause the inode
size to remain the same.

Also add appropriate tracepoints.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b8a86845

24 2月, 2014 1 次提交

ext4: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate · 9eb79482

由 Namjae Jeon 提交于 2月 23, 2014

This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for Ext4.
 
The semantics of this flag are following:
1) It collapses the range lying between offset and length by removing any data
   blocks which are present in this range and than updates all the logical
   offsets of extents beyond "offset + len" to nullify the hole created by
   removing blocks. In short, it does not leave a hole.
2) It should be used exclusively. No other fallocate flag in combination.
3) Offset and length supplied to fallocate should be fs block size aligned
   in case of xfs and ext4.
4) Collaspe range does not work beyond i_size.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
Tested-by: NDongsu Park <dongsu.park@profitbricks.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9eb79482

22 2月, 2014 1 次提交

ext4: translate fallocate mode bits to strings · a633f5a3

由 Lukas Czerner 提交于 2月 22, 2014

Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a633f5a3

17 2月, 2014 1 次提交

ext4: don't leave i_crtime.tv_sec uninitialized · 19ea8060

由 Theodore Ts'o 提交于 2月 16, 2014

If the i_crtime field is not present in the inode, don't leave the
field uninitialized.

Fixes: ef7f3835 ("ext4: Add nanosecond timestamps")
Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
Tested-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

19ea8060

20 12月, 2013 1 次提交

ext4: add explicit casts when masking cluster sizes · f5a44db5

由 Theodore Ts'o 提交于 12月 20, 2013

The missing casts can cause the high 64-bits of the physical blocks to
be lost.  Set up new macros which allows us to make sure the right
thing happen, even if at some point we end up supporting larger
logical block numbers.

Thanks to the Emese Revfy and the PaX security team for reporting this
issue.
Reported-by: NPaX Team <pageexec@freemail.hu>
Reported-by: Emese Revfy <re.emese@gmail.com>                                 
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

f5a44db5

12 11月, 2013 1 次提交

ext4: add prototypes for macro-generated functions · 3f61c0cc

由 Andreas Dilger 提交于 11月 11, 2013

It isn't very easy to find the declarations for the functions created
by EXT4_INODE_BIT_FNS() because the names are generated by macros:

    ext4_test_inode_flag, ext4_set_inode_flag, ext4_clear_inode_flag
    ext4_test_inode_state, ext4_set_inode_state, ext4_clear_inode_state

Add explicit declarations for these functions so that grep and tags
can find them.
Signed-off-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3f61c0cc

09 11月, 2013 1 次提交

vfs: pull ext4's double-i_mutex-locking into common code · 375e289e

由 J. Bruce Fields 提交于 4月 18, 2012

We want to do this elsewhere as well.

Also catch any attempts to use it for directories (where this ordering
would conflict with ancestor-first directory ordering in lock_rename).

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

375e289e

18 10月, 2013 1 次提交

ext4: add ratelimiting to ext4 messages · efbed4dc

由 Theodore Ts'o 提交于 10月 17, 2013

In the case of a storage device that suddenly disappears, or in the
case of significant file system corruption, this can result in a huge
flood of messages being sent to the console.  This can overflow the
file system containing /var/log/messages, or if a serial console is
configured, this can slow down the system so much that a hardware
watchdog can end up triggering forcing a system reboot.

Google-Bug-Id: 7258357
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

efbed4dc

04 9月, 2013 1 次提交

direct-io: Implement generic deferred AIO completions · 7b7a8665

由 Christoph Hellwig 提交于 9月 04, 2013

Add support to the core direct-io code to defer AIO completions to user
context using a workqueue.  This replaces opencoded and less efficient
code in XFS and ext4 (we save a memory allocation for each direct IO)
and will be needed to properly support O_(D)SYNC for AIO.

The communication between the filesystem and the direct I/O code requires
a new buffer head flag, which is a bit ugly but not avoidable until the
direct I/O code stops abusing the buffer_head structure for communicating
with the filesystems.

Currently this creates a per-superblock unbound workqueue for these
completions, which is taken from an earlier patch by Jan Kara.  I'm
not really convinced about this use and would prefer a "normal" global
workqueue with a high concurrency limit, but this needs further discussion.

JK: Fixed ext4 part, dynamic allocation of the workqueue.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7b7a8665

29 8月, 2013 3 次提交

ext4: mark block group as corrupt on inode bitmap error · 87a39389

由 Darrick J. Wong 提交于 8月 28, 2013

If we detect either a discrepancy between the inode bitmap and the
inode counts or the inode bitmap fails to pass validation checks, mark
the block group corrupt and refuse to allocate or deallocate inodes
from the group.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

87a39389

ext4: mark block group as corrupt on block bitmap error · 163a203d

由 Darrick J. Wong 提交于 8月 28, 2013

When we notice a block-bitmap corruption (because of device failure or
something else), we should mark this group as corrupt and prevent
further block allocations/deallocations from it. Currently, we end up
generating one error message for every block in the bitmap. This
potentially could make the system unstable as noticed in some
bugs. With this patch, the error will be printed only the first time
and mark the entire block group as corrupted. This prevents future
access allocations/deallocations from it.

Also tested by corrupting the block
bitmap and forcefully introducing the mb_free_blocks error:
(1) create a largefile (2Gb)
$ dd if=/dev/zero of=largefile oflag=direct bs=10485760 count=200
(2) umount filesystem. use dumpe2fs to see which block-bitmaps
are in use by largefile and note their block numbers
(3) use dd to zero-out the used block bitmaps
$ dd if=/dev/zero of=/dev/hdc4 bs=4096 seek=14 count=8 oflag=direct
(4) mount the FS and delete the largefile.
(5) recreate the largefile. verify that the new largefile does not
get any blocks from the groups marked as bad.
Without the patch, we will see mb_free_blocks error for each bit in
each zero'ed out bitmap at (4). With the patch, we only see the error
once per blockgroup:
[  309.706803] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 15: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[  309.720824] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 14: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[  309.732858] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
[  309.748321] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 13: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[  309.760331] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
[  309.769695] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 12: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[  309.781721] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
[  309.798166] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 11: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[  309.810184] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
[  309.819532] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 10: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.

Google-Bug-Id: 7258357

[darrick.wong@oracle.com]
Further modifications (by Darrick) to make more obvious that this corruption
bit applies to blocks only.  Set the corruption flag if the block group bitmap
verification fails.

Original-author: Aditya Kali <adityakali@google.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

163a203d

ext4: fix type declaration of ext4_validate_block_bitmap · dbde0abe

由 Darrick J. Wong 提交于 8月 28, 2013

The block_group parameter to ext4_validate_block_bitmap is both used
as a ext4_group_t inside the function and the same type is passed in
by all callers.  We might as well use the typedef consistently instead
of open-coding the 'unsigned int'.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

dbde0abe