提交 · fdc0212e86ca15c5cfed77088af7cc5eb79ccbc7 · openanolis / cloud-kernel

18 2月, 2013 2 次提交

ext4: add physical block and status member into extent status tree · fdc0212e

由 Zheng Liu 提交于 2月 18, 2013

This commit adds two members in extent_status structure to let it record
physical block and extent status.  Here es_pblk is used to record both
of them because physical block only has 48 bits.  So extent status could
be stashed into it so that we can save some memory.  Now written,
unwritten, delayed and hole are defined as status.

Due to new member is added into extent status tree, all interfaces need
to be adjusted.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

fdc0212e

ext4: refine extent status tree · 06b0c886

由 Zheng Liu 提交于 2月 18, 2013

This commit refines the extent status tree code.

1) A prefix 'es_' is added to to the extent status tree structure
members.

2) Refactored es_remove_extent() so that __es_remove_extent() can be
used by es_insert_extent() to remove the old extent entry(-ies) before
inserting a new one.

3) Rename extent_status_end() to ext4_es_end()

4) ext4_es_can_be_merged() is define to check whether two extents can
be merged or not.

5) Update and clarified comments.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

06b0c886

15 2月, 2013 4 次提交

ext4: use ERR_PTR() abstraction for ext4_append() · 0f70b406

由 Theodore Ts'o 提交于 2月 15, 2013

Use ERR_PTR()/IS_ERR() abstraction instead of passing in a separate
pointer to an integer for the error code, as a code cleanup.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0f70b406

ext4: refactor code to read directory blocks into ext4_read_dirblock() · dc6982ff

由 Theodore Ts'o 提交于 2月 14, 2013

The code to read in directory blocks and verify their metadata
checksums was replicated in ten different places across
fs/ext4/namei.c, and the code was buggy in subtle ways in a number of
those replicated sites.  In some cases, ext4_error() was called with a
training newline.  In others, in particularly in empty_dir(), it was
possible to call ext4_dirent_csum_verify() on an index block, which
would trigger false warnings requesting the system adminsitrator to
run e2fsck.

By refactoring the code, we make the code more readable, as well as
shrinking the compiled object file by over 700 bytes and 50 lines of
code.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

dc6982ff

ext4: add debugging context for warning in ext4_da_update_reserve_space() · 01a523eb

由 Theodore Ts'o 提交于 2月 14, 2013

Print some additional debugging context to hopefully help to debug a
warning which is getting triggered by xfstests #74.

Also remove extraneous newlines from when printk's were converted to
ext4_warning() and ext4_msg().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

01a523eb

ext4: use KERN_WARNING for warning messages · 8de5c325

由 Theodore Ts'o 提交于 2月 14, 2013

Some messages printed related to a WARN_ON(1) were printed using
KERN_NOTICE.  Use KERN_WARNING or ext4_warning() instead so that
context related to the WARN_ON() is printed at the same printk warning
level (and log files, etc.)
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8de5c325

10 2月, 2013 6 次提交

jbd2: use module parameters instead of debugfs for jbd_debug · b6e96d00

由 Theodore Ts'o 提交于 2月 09, 2013

There are multiple reasons to move away from debugfs.  First of all,
we are only using it for a single parameter, and it is much more
complicated to set up (some 30 lines of code compared to 3), and one
more thing that might fail while loading the jbd2 module.

Secondly, as a module paramter it can be specified as a boot option if
jbd2 is built into the kernel, or as a parameter when the module is
loaded, and it can also be manipulated dynamically under
/sys/module/jbd2/parameters/jbd2_debug.  So it is more flexible.

Ultimately we want to move away from using jbd_debug() towards
tracepoints, but for now this is still a useful simplification of the
code base.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b6e96d00

ext4: use module parameters instead of debugfs for mballoc_debug · a0b30c12

由 Theodore Ts'o 提交于 2月 09, 2013

There are multiple reasons to move away from debugfs.  First of all,
we are only using it for a single parameter, and it is much more
complicated to set up (some 30 lines of code compared to 3), and one
more thing that might fail while loading the ext4 module.

Secondly, as a module paramter it can be specified as a boot option if
ext4 is built into the kernel, or as a parameter when the module is
loaded, and it can also be manipulated dynamically under
/sys/module/ext4/parameters/mballoc_debug.  So it is more flexible.

Ultimately we want to move away from using mb_debug() towards
tracepoints, but for now this is still a useful simplification of the
code base.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a0b30c12

ext4: start handle at the last possible moment when creating inodes · 1139575a

由 Theodore Ts'o 提交于 2月 09, 2013

In ext4_{create,mknod,mkdir,symlink}(), don't start the journal handle
until the inode has been succesfully allocated. In order to do this,
we need to start the handle in the ext4_new_inode(). So create a new
variant of this function, ext4_new_inode_start_handle(), so the handle
can be created at the last possible minute, before we need to modify
the inode allocation bitmap block.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1139575a

ext4: fix the number of credits needed for acl ops with inline data · 95eaefbd

由 Theodore Ts'o 提交于 2月 09, 2013

Operations which modify extended attributes may need extra journal
credits if inline data is used, since there is a chance that some
extended attributes may need to get pushed to an external attribute
block.

Changes to reflect this was made in xattr.c, but they were missed in
fs/ext4/acl.c.  To fix this, abstract the calculation of the number of
credits needed for xattr operations to an inline function defined in
ext4_jbd2.h, and use it in acl.c and xattr.c.

Also move the function declarations used in inline.c from xattr.h
(where they are non-obviously hidden, and caused problems since
ext4_jbd2.h needs to use the function ext4_has_inline_data), and move
them to ext4.h.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NTao Ma <boyu.mt@taobao.com>
Reviewed-by: NJan Kara <jack@suse.cz>

95eaefbd

ext4: fix the number of credits needed for ext4_unlink() and ext4_rmdir() · 64044abf

由 Theodore Ts'o 提交于 2月 09, 2013

The ext4_unlink() and ext4_rmdir() don't actually release the blocks
associated with the file/directory.  This gets done in a separate jbd2
handle called via ext4_evict_inode().  Thus, we don't need to reserve
lots of journal credits for the truncate.

Note that using too many journal credits is non-optimal because it can
leading to the journal transmit getting closed too early, before it is
strictly necessary.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

64044abf

ext4: fix the number of credits needed for ext4_ext_migrate() · 4b217630

由 Theodore Ts'o 提交于 2月 09, 2013

The migration ioctl creates a temporary inode.  Since this inode is
never linked to a directory, we don't need to reserve journal credits
required for modifying the directory.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

4b217630

09 2月, 2013 6 次提交

ext4: start handle at the last possible moment in ext4_rmdir() · 8dcfaad2

由 Theodore Ts'o 提交于 2月 09, 2013

Don't start the jbd2 transaction handle until after the directory
entry has been found, to minimize the amount of time that a handle is
held active.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

8dcfaad2

ext4: start handle at the last possible moment in ext4_unlink() · 931b6864

由 Theodore Ts'o 提交于 2月 09, 2013

Don't start the jbd2 transaction handle until after the directory
entry has been found, to minimize the amount of time that a handle is
held active.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

931b6864

ext4: grab page before starting transaction handle in write_begin() · 47564bfb

由 Theodore Ts'o 提交于 2月 09, 2013

The grab_cache_page_write_begin() function can potentially sleep for a
long time, since it may need to do memory allocation which can block
if the system is under significant memory pressure, and because it may
be blocked on page writeback.  If it does take a long time to grab the
page, it's better that we not hold an active jbd2 handle.

So grab a handle on the page first, and _then_ start the transaction
handle.

This commit fixes the following long transaction handle hold time:

postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32
   tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
   dirtied_blocks 0
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

47564bfb

ext4: pass context information to jbd2__journal_start() · 9924a92a

由 Theodore Ts'o 提交于 2月 08, 2013

So we can better understand what bits of ext4 are responsible for
long-running jbd2 handles, use jbd2__journal_start() so we can pass
context information for logging purposes.

The recommended way for finding the longer-running handles is:

   T=/sys/kernel/debug/tracing
   EVENT=$T/events/jbd2/jbd2_handle_stats
   echo "interval > 5" > $EVENT/filter
   echo 1 > $EVENT/enable

   ./run-my-fs-benchmark

   cat $T/trace > /tmp/problem-handles

This will list handles that were active for longer than 20ms.  Having
longer-running handles is bad, because a commit started at the wrong
time could stall for those 20+ milliseconds, which could delay an
fsync() or an O_SYNC operation.  Here is an example line from the
trace file describing a handle which lived on for 311 jiffies, or over
1.2 seconds:

postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
   tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
   dirtied_blocks 0
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9924a92a

ext4: move the jbd2 wrapper functions out of super.c · 722887dd

由 Theodore Ts'o 提交于 2月 08, 2013

Move the jbd2 wrapper functions which start and stop handles out of
super.c, where they don't really logically belong, and into
ext4_jbd2.c.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

722887dd

jbd2: add tracepoints which provide per-handle statistics · 343d9c28

由 Theodore Ts'o 提交于 2月 08, 2013

Handles which stay open a long time are problematic when it comes time
to close down a transaction so it can be committed. These tracepoints
will help us determine which ones are the problematic ones, and to
validate whether changes makes things better or worse.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

343d9c28

07 2月, 2013 2 次提交

jbd2: revert "jbd2: add COW fields to struct jbd2_journal_handle" · 078d5039

由 Theodore Ts'o 提交于 2月 07, 2013

This reverts commit 93737456.

The cow-snapshots effort is no longer active, so remove these extra
fields to shrink down the handle structure.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

078d5039

jbd2: track request delay statistics · 9fff24aa

由 Theodore Ts'o 提交于 2月 06, 2013

Track the delay between when we first request that the commit begin
and when it actually begins, so we can see how much of a gap exists.
In theory, this should just be the remaining scheduling quantuum of
the thread which requested the commit (assuming it was not a
synchronous operation which triggered the commit request) plus
scheduling overhead; however, it's possible that real time processes
might get in the way of letting the kjournald thread from executing.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9fff24aa

05 2月, 2013 1 次提交

ext4: optimize mballoc for large allocations · 40ae3487

由 Theodore Ts'o 提交于 2月 04, 2013

The ext4 block allocator only maintains buddy bitmaps for chunks which
are less than or equal to one quarter of a block group.  That is, for
a file aystem with a 1k blocksize, and where the number of blocks in a
block group is 8192 blocks, the largest chunk size tracked by buddy
bitmaps is 2048 blocks.

For a file system with a 4k blocksize, and where the number of blocks
in a block group is 32768 blocks, the largest chunk size tracked by
buddy bitmaps is 8192 blocks.

To work around this code, mballoc.c before this commit would truncate
allocation requests to the number of blocks in a block group minus 10.
Why 10?  Aside from being a completely arbitrary number, it avoids
block allocation to be a power of two larger than 25% of the block
group.  If you try to explicitly fallocate 50% of the block group
size, this will demonstrate the problem; the block allocation code
will scan the all of the blocks in the file system with cr==0 (since
the request is for a natural power of two), but then completely fail
for all blocks groups, since the buddy bitmaps don't track chunk sizes
of 50% of the block group.

To fix this, in these we use ext4_mb_complex_scan_group() instead of
ext4_mb_simple_scan_group().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger@dilger.ca>

40ae3487

03 2月, 2013 4 次提交

ext4: check incompatible mount options while mounting ext2/3 · 8dc0aa8c

由 Theodore Ts'o 提交于 2月 02, 2013

Check for incompatible mount options when using the ext4 file system
driver to mount ext2 or ext3 file systems.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8dc0aa8c

ext4: print error when argument of inode_readahead_blk is invalid · e33e60ea

由 Jan Kara 提交于 2月 02, 2013

If argument of inode_readahead_blk is too big, we just bail out
without printing any error. Fix this since it could confuse users.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e33e60ea

ext4: make mount option parsing loop more logical · 5f3633e3

由 Jan Kara 提交于 2月 02, 2013

The loop looking for correct mount option entry is more logical if it is
written rewritten as an empty loop looking for correct option entry and then
code handling the option. It also saves one level of indentation for a lot of
code so we can join a couple of split lines.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5f3633e3

ext4: move several mount options to standard handling loop · 0efb3b23

由 Jan Kara 提交于 2月 02, 2013

Several mount option (resuid, resgid, journal_dev, journal_ioprio) are
currently handled before we enter standard option handling loop. I don't
see a reason for this so move them to normal handling loop to make things
more regular.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0efb3b23

02 2月, 2013 4 次提交

ext4: reduce one "if" comparison in ext4_dirhash() · 0e79537d

由 Cong Ding 提交于 2月 01, 2013

It is unnecessary to check i<4 after the loop; just do it before the
break.
Signed-off-by: NCong Ding <dinggnu@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0e79537d

ext4: fix race in ext4_mb_add_n_trim() · f1167009

由 Niu Yawei 提交于 2月 01, 2013

In ext4_mb_add_n_trim(), lg_prealloc_lock should be taken when
changing the lg_prealloc_list.
Signed-off-by: NNiu Yawei <yawei.niu@intel.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

f1167009

ext4: fix smatch warning in move_extent.c's mext_replace_branches() · 87e69873

由 Akria Fujita 提交于 2月 01, 2013

Commit 2147b1a6 resulted in a new smatch warning:

> fs/ext4/move_extent.c:693 mext_replace_branches()
> 	 warn: variable dereferenced before check 'dext' (see line 683)

Fix this by adding a check to make sure dext is non-NULL before we
derefrence it.
Signed-off-by: NAkria Fujita <a-fujita@rs.jp.nec.com>
[ modified by tytso to make sure an ext4_error is called ]
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

87e69873

ext4: use WARN in ext4_alloc_blocks · 524c19eb

由 Julia Lawall 提交于 2月 01, 2013

Use WARN rather than printk followed by WARN_ON(1), for conciseness.

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression list es;
@@

-printk(
+WARN(1,
  es);
-WARN_ON(1);
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

524c19eb

30 1月, 2013 2 次提交

jbd2: don't wake kjournald unnecessarily · e7b04ac0

由 Eric Sandeen 提交于 1月 30, 2013

Don't send an extra wakeup to kjournald in the case where we
already have the proper target in j_commit_request, i.e. that
transaction has already been requested for commit.

commit deeeaf13 "jbd2: fix fsync() tid wraparound bug" changed
the logic leading to a wakeup, but it caused some extra wakeups
which were found to lead to a measurable performance regression.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
[tytso@mit.edu: reworked check to make it clearer]
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e7b04ac0

ext4: fix possible use-after-free with AIO · 091e26df

由 Jan Kara 提交于 1月 29, 2013

Running AIO is pinning inode in memory using file reference. Once AIO
is completed using aio_complete(), file reference is put and inode can
be freed from memory. So we have to be sure that calling aio_complete()
is the last thing we do with the inode.

CC: stable@vger.kernel.org
Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

091e26df

29 1月, 2013 9 次提交

ext4: remove unnecessary NULL pointer check · b1deefc9

由 Guo Chao 提交于 1月 28, 2013

brelse() and ext4_journal_force_commit() are both inlined and able
to handle NULL.
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b1deefc9

ext4: remove useless assignment in dx_probe() · 41be871f

由 Guo Chao 提交于 1月 28, 2013

Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

41be871f

ext4: remove unused variable in add_dirent_to_buf() · 2bbbee2a

由 Guo Chao 提交于 1月 28, 2013

After commit 978fef91 (create __ext4_insert_dentry for dir entry
insertion), 'reclen' is not used anymore.
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>

2bbbee2a

ext4: release buffer when checksum failed · d5ac7773

由 Guo Chao 提交于 1月 28, 2013

Commit b0336e8d (ext4: calculate and verify checksums of directory
leaf blocks) and commit dbe89444 (ext4: Calculate and verify checksums
for htree nodes) forget to release buffer when checksum failed, at
some places.
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>

d5ac7773

ext4: remove explicit WARN_ON when ext4_map_blocks() fails · b06acd38

由 Lukas Czerner 提交于 1月 28, 2013

In two places we call WARN_ON() before we print out the debug message,
however we agreed that the WARN_ON() is unnecessary at those places so
remove them.

Also use ext4_warning() instead of ext4_msg() and printk().
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b06acd38

ext4: remove unused variable flags · cfa72754

由 Lukas Czerner 提交于 1月 28, 2013

Remove unused variable flags from dump_completed_IO(). The code is
only exercised when EXT4FS_DEBUG is defined.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>

cfa72754

ext4: fix ext4_writepage() to achieve data=ordered guarantees · fe386132

由 Jan Kara 提交于 1月 28, 2013

So far ext4_writepage() skipped writing pages that had any delayed or
unwritten buffers attached. When blocksize < pagesize this breaks
data=ordered mode guarantees as we can have a page with one freshly
allocated buffer whose allocation is part of the committing
transaction and another buffer in the page which is delayed or
unwritten. So fix this problem by calling ext4_bio_writepage()
anyway. It will submit mapped buffers and leave others alone.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fe386132

ext4: Make ext4_bio_writepage() handle unprepared buffers · 8a850c3f

由 Jan Kara 提交于 1月 28, 2013

So far ext4_bio_writepage() unconditionally cleared dirty bit on all
buffers underlying the page. That implicitely assumes we can write all
buffers. So far that is true because callers call into
ext4_bio_writepage() make sure all buffers in the page are mapped but:

a) it's a data corruption bug waiting to happen
b) in data=ordered mode when blocksize < pagesize we do need to write
   pages that may have only some of dirty buffers mapped.

So change ext4_bio_writepage() to skip buffers that cannot be written without
clearing their dirty bit.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8a850c3f

ext4: simplify mpage_add_bh_to_extent() · b6a8e62f

由 Jan Kara 提交于 1月 28, 2013

The argument b_size of mpage_add_bh_to_extent() was bogus since it was
always == blocksize (which we can easily derive from inode->i_blkbits).
Also second branch of condition:
	if (nrblocks >= EXT4_MAX_TRANS_DATA) {
	} else if ((nrblocks + (b_size >> mpd->inode->i_blkbits)) >
						EXT4_MAX_TRANS_DATA) {
	}
was never taken because (b_size >> mpd->inode->i_blkbits) == 1.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b6a8e62f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功