提交 · d4e439549127d3ca544482551f5f1af4f114debd · openanolis / cloud-kernel

03 3月, 2013 4 次提交

ext4: fix a warning from sparse check for ext4_dir_llseek · d4e43954

由 Zheng Liu 提交于 3月 02, 2013

ext4_dir_llseek is only used as a callback function, and no one calls
it directly.  So make it as a static function in order to remove a
warning message from sparse check.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d4e43954

ext4: convert number of blocks to clusters properly · 810da240

由 Lukas Czerner 提交于 3月 02, 2013

We're using macro EXT4_B2C() to convert number of blocks to number of
clusters for bigalloc file systems.  However, we should be using
EXT4_NUM_B2C().
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

810da240

ext4: fix possible memory leak in ext4_remount() · 3e36a163

由 Wei Yongjun 提交于 3月 02, 2013

'orig_data' is malloced in ext4_remount() and should be freed
before leaving from the error handling cases, otherwise it will
cause memory leak.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Cc: stable@vger.kernel.org

3e36a163

jbd2: fix ERR_PTR dereference in jbd2__journal_start · df05c1b8

由 Dmitry Monakhov 提交于 3月 02, 2013

If start_this_handle() failed handle will be initialized
to ERR_PTR() and can not be dereferenced.

paging request at fffffffffffffff6
IP: [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290
PGD 200e067 PUD 200f067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
CPU 0 journal commit I/O error

Pid: 2694, comm: fio Not tainted 3.8.0-rc3+ #79 /DQ67SW
RIP: 0010:[<ffffffff813c073f>] [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290
RSP: 0018:ffff880233b8ba58 EFLAGS: 00010292
RAX: 00000000ffffffe2 RBX: ffffffffffffffe2 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff82128f48
RBP: ffff880233b8ba98 R08: 0000000000000000 R09: ffff88021440a6e0
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

df05c1b8

02 3月, 2013 1 次提交

ext4: use percpu counter for extent cache count · 1ac6466f

由 Theodore Ts'o 提交于 3月 02, 2013

Use a percpu counter rather than atomic types for shrinker accounting.
There's no need for ultimate accuracy in the shrinker, so this
should come a little more cheaply.  The percpu struct is somewhat
large, but there was a big gap before the cache-aligned
s_es_lru_lock anyway, and it fits nicely in there.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1ac6466f

01 3月, 2013 1 次提交

ext4: optimize ext4_es_shrink() · 24630774

由 Theodore Ts'o 提交于 2月 28, 2013

When the system is under memory pressure, ext4_es_srhink() will get
called very often.  So optimize returning the number of items in the
file system's extent status cache by keeping a per-filesystem count,
instead of calculating it each time by scanning all of the inodes in
the extent status cache.

Also rename the slab used for the extent status cache to be
"ext4_extent_status" so it's obviousl the slab in question is created
by ext4.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Zheng Liu <gnehzuil.liu@gmail.com>

24630774

28 2月, 2013 1 次提交

ext4: fix extent status tree regression for file systems > 512GB · 8e919d13

由 Theodore Ts'o 提交于 2月 27, 2013

This fixes a regression introduced by commit f7fec032.  The
problem was that the extents status flags caused us to mask out block
numbers smaller than 2**28 blocks.  Since we didn't test with file
systems smaller than 512GB, we didn't notice this during the
development cycle.

A typical failure looks like this:

EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235804: block
152052301: comm ls: bad entry in directory: rec_len is smaller than minimal -
offset=0(0), inode=0, rec_len=0, name_len=0

... where 'debugfs -R "stat <172235804>" /dev/sdb1' reports that the
inode has block number 688923213.  When viewed in hex, block number
152052301 (from the syslog) is 0x910224D, while block number 688923213
is 0x2910224D.  Note the missing "0x20000000" in the block number.
Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Verified-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: NDave Jones <davej@redhat.com>
Verified-by: NDave Jones <davej@redhat.com>
Cc: Zheng Liu <gnehzuil.liu@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8e919d13

23 2月, 2013 2 次提交

ext4: fix free clusters calculation in bigalloc filesystem · 304e220f

由 Lukas Czerner 提交于 2月 22, 2013

ext4_has_free_clusters() should tell us whether there is enough free
clusters to allocate, however number of free clusters in the file system
is converted to blocks using EXT4_C2B() which is not only wrong use of
the macro (we should have used EXT4_NUM_B2C) but it's also completely
wrong concept since everything else is in cluster units.

Moreover when calculating number of root clusters we should be using
macro EXT4_NUM_B2C() instead of EXT4_B2C() otherwise the result might be
off by one. However r_blocks_count should always be a multiple of the
cluster ratio so doing a plain bit shift should be enough here. We
avoid using EXT4_B2C() because it's confusing.

As a result of the first problem number of free clusters is much bigger
than it should have been and ext4_has_free_clusters() would return 1 even
if there is really not enough free clusters available.

Fix this by removing the EXT4_C2B() conversion of free clusters and
using bit shift when calculating number of root clusters. This bug
affects number of xfstests tests covering file system ENOSPC situation
handling. With this patch most of the ENOSPC problems with bigalloc file
system disappear, especially the errors caused by delayed allocation not
having enough space when the actual allocation is finally requested.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

304e220f

ext4: no need to remove extent if len is 0 in ext4_es_remove_extent() · d4381472

由 Eryu Guan 提交于 2月 22, 2013

len is 0 means no extent needs to be removed, so return immediately.
Otherwise it could trigger the following BUG_ON() in
ext4_es_remove_extent()

	end = lblk + len - 1;
	BUG_ON(end < lblk);

This could be reproduced by a simple truncate(1) command by an
unprivileged user

	truncate -s $(($((2**32 - 1)) * 4096)) /mnt/ext4/testfile

The same is true for __es_insert_extent().

Patched kernel passed xfstests regression test.
Signed-off-by: NEryu Guan <guaneryu@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>

d4381472

19 2月, 2013 1 次提交

ext4: fix xattr block allocation/release with bigalloc · 1231b3a1

由 Lukas Czerner 提交于 2月 18, 2013

Currently when new xattr block is created or released we we would call
dquot_free_block() or dquot_alloc_block() respectively, among the else
decrementing or incrementing the number of blocks assigned to the
inode by one block.

This however does not work for bigalloc file system because we always
allocate/free the whole cluster so we have to count with that in
dquot_free_block() and dquot_alloc_block() as well.

Use the clusters-to-blocks conversion EXT4_C2B() when passing number of
blocks to the dquot_alloc/free functions to fix the problem.

The problem has been revealed by xfstests #117 (and possibly others).
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Cc: stable@vger.kernel.org

1231b3a1

18 2月, 2013 9 次提交

ext4: reclaim extents from extent status tree · 74cd15cd

由 Zheng Liu 提交于 2月 18, 2013

Although extent status is loaded on-demand, we also need to reclaim
extent from the tree when we are under a heavy memory pressure because
in some cases fragmented extent tree causes status tree costs too much
memory.

Here we maintain a lru list in super_block.  When the extent status of
an inode is accessed and changed, this inode will be move to the tail
of the list.  The inode will be dropped from this list when it is
cleared.  In the inode, a counter is added to count the number of
cached objects in extent status tree.  Here only written/unwritten/hole
extent is counted because delayed extent doesn't be reclaimed due to
fiemap, bigalloc and seek_data/hole need it.  The counter will be
increased as a new extent is allocated, and it will be decreased as a
extent is freed.

In this commit we use normal shrinker framework to reclaim memory from
the status tree.  ext4_es_reclaim_extents_count() traverses the lru list
to count the number of reclaimable extents.  ext4_es_shrink() tries to
reclaim written/unwritten/hole extents from extent status tree.  The
inode that has been shrunk is moved to the tail of lru list.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>

74cd15cd

ext4: adjust some functions for reclaiming extents from extent status tree · bdedbb7b

由 Zheng Liu 提交于 2月 18, 2013

This commit changes some interfaces in extent status tree because we
need to use inode to count the cached objects in a extent status tree.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>

bdedbb7b

ext4: remove single extent cache · 69eb33dc

由 Zheng Liu 提交于 2月 18, 2013

Single extent cache could be removed because we have extent status tree
as a extent cache, and it would be better.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>

69eb33dc

ext4: lookup block mapping in extent status tree · d100eef2

由 Zheng Liu 提交于 2月 18, 2013

After tracking all extent status, we already have a extent cache in
memory.  Every time we want to lookup a block mapping, we can first
try to lookup it in extent status tree to avoid a potential disk I/O.

A new function called ext4_es_lookup_extent is defined to finish this
work.  When we try to lookup a block mapping, we always call
ext4_map_blocks and/or ext4_da_map_blocks.  So in these functions we
first try to lookup a block mapping in extent status tree.

A new flag EXT4_GET_BLOCKS_NO_PUT_HOLE is used in ext4_da_map_blocks
in order not to put a hole into extent status tree because this hole
will be converted to delayed extent in the tree immediately.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>

d100eef2

ext4: track all extent status in extent status tree · f7fec032

由 Zheng Liu 提交于 2月 18, 2013

By recording the phycisal block and status, extent status tree is able
to track the status of every extents.  When we call _map_blocks
functions to lookup an extent or create a new written/unwritten/delayed
extent, this extent will be inserted into extent status tree.

We don't load all extents from disk in alloc_inode() because it costs
too much memory, and if a file is opened and closed frequently it will
takes too much time to load all extent information.  So currently when
we create/lookup an extent, this extent will be inserted into extent
status tree.  Hence, the extent status tree may not comprehensively
contain all of the extents found in the file.

Here a condition we need to take care is that an extent might contains
unwritten and delayed status simultaneously because an extent is delayed
allocated and could be allocated by fallocate.  At this time we need to
keep delayed status because later we need to update delayed reservation
space using it.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>

f7fec032

ext4: let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag · a25a4e1a

由 Zheng Liu 提交于 2月 18, 2013

This commit lets ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag
because in later commit ext4_map_blocks needs to use this flag to
determine the extent status.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

a25a4e1a

ext4: rename and improbe ext4_es_find_extent() · be401363

由 Zheng Liu 提交于 2月 18, 2013

This commit renames ext4_es_find_extent with ext4_es_find_delayed_extent
and improve this function.  First, we split input and output parameter.
Second, this function never return the first block of the next delayed
extent after 'es'.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>

be401363

ext4: add physical block and status member into extent status tree · fdc0212e

由 Zheng Liu 提交于 2月 18, 2013

This commit adds two members in extent_status structure to let it record
physical block and extent status.  Here es_pblk is used to record both
of them because physical block only has 48 bits.  So extent status could
be stashed into it so that we can save some memory.  Now written,
unwritten, delayed and hole are defined as status.

Due to new member is added into extent status tree, all interfaces need
to be adjusted.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

fdc0212e

ext4: refine extent status tree · 06b0c886

由 Zheng Liu 提交于 2月 18, 2013

This commit refines the extent status tree code.

1) A prefix 'es_' is added to to the extent status tree structure
members.

2) Refactored es_remove_extent() so that __es_remove_extent() can be
used by es_insert_extent() to remove the old extent entry(-ies) before
inserting a new one.

3) Rename extent_status_end() to ext4_es_end()

4) ext4_es_can_be_merged() is define to check whether two extents can
be merged or not.

5) Update and clarified comments.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

06b0c886

15 2月, 2013 4 次提交

ext4: use ERR_PTR() abstraction for ext4_append() · 0f70b406

由 Theodore Ts'o 提交于 2月 15, 2013

Use ERR_PTR()/IS_ERR() abstraction instead of passing in a separate
pointer to an integer for the error code, as a code cleanup.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0f70b406

ext4: refactor code to read directory blocks into ext4_read_dirblock() · dc6982ff

由 Theodore Ts'o 提交于 2月 14, 2013

The code to read in directory blocks and verify their metadata
checksums was replicated in ten different places across
fs/ext4/namei.c, and the code was buggy in subtle ways in a number of
those replicated sites.  In some cases, ext4_error() was called with a
training newline.  In others, in particularly in empty_dir(), it was
possible to call ext4_dirent_csum_verify() on an index block, which
would trigger false warnings requesting the system adminsitrator to
run e2fsck.

By refactoring the code, we make the code more readable, as well as
shrinking the compiled object file by over 700 bytes and 50 lines of
code.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

dc6982ff

ext4: add debugging context for warning in ext4_da_update_reserve_space() · 01a523eb

由 Theodore Ts'o 提交于 2月 14, 2013

Print some additional debugging context to hopefully help to debug a
warning which is getting triggered by xfstests #74.

Also remove extraneous newlines from when printk's were converted to
ext4_warning() and ext4_msg().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

01a523eb

ext4: use KERN_WARNING for warning messages · 8de5c325

由 Theodore Ts'o 提交于 2月 14, 2013

Some messages printed related to a WARN_ON(1) were printed using
KERN_NOTICE.  Use KERN_WARNING or ext4_warning() instead so that
context related to the WARN_ON() is printed at the same printk warning
level (and log files, etc.)
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8de5c325

10 2月, 2013 6 次提交

jbd2: use module parameters instead of debugfs for jbd_debug · b6e96d00

由 Theodore Ts'o 提交于 2月 09, 2013

There are multiple reasons to move away from debugfs.  First of all,
we are only using it for a single parameter, and it is much more
complicated to set up (some 30 lines of code compared to 3), and one
more thing that might fail while loading the jbd2 module.

Secondly, as a module paramter it can be specified as a boot option if
jbd2 is built into the kernel, or as a parameter when the module is
loaded, and it can also be manipulated dynamically under
/sys/module/jbd2/parameters/jbd2_debug.  So it is more flexible.

Ultimately we want to move away from using jbd_debug() towards
tracepoints, but for now this is still a useful simplification of the
code base.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b6e96d00

ext4: use module parameters instead of debugfs for mballoc_debug · a0b30c12

由 Theodore Ts'o 提交于 2月 09, 2013

There are multiple reasons to move away from debugfs.  First of all,
we are only using it for a single parameter, and it is much more
complicated to set up (some 30 lines of code compared to 3), and one
more thing that might fail while loading the ext4 module.

Secondly, as a module paramter it can be specified as a boot option if
ext4 is built into the kernel, or as a parameter when the module is
loaded, and it can also be manipulated dynamically under
/sys/module/ext4/parameters/mballoc_debug.  So it is more flexible.

Ultimately we want to move away from using mb_debug() towards
tracepoints, but for now this is still a useful simplification of the
code base.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a0b30c12

ext4: start handle at the last possible moment when creating inodes · 1139575a

由 Theodore Ts'o 提交于 2月 09, 2013

In ext4_{create,mknod,mkdir,symlink}(), don't start the journal handle
until the inode has been succesfully allocated. In order to do this,
we need to start the handle in the ext4_new_inode(). So create a new
variant of this function, ext4_new_inode_start_handle(), so the handle
can be created at the last possible minute, before we need to modify
the inode allocation bitmap block.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1139575a

ext4: fix the number of credits needed for acl ops with inline data · 95eaefbd

由 Theodore Ts'o 提交于 2月 09, 2013

Operations which modify extended attributes may need extra journal
credits if inline data is used, since there is a chance that some
extended attributes may need to get pushed to an external attribute
block.

Changes to reflect this was made in xattr.c, but they were missed in
fs/ext4/acl.c.  To fix this, abstract the calculation of the number of
credits needed for xattr operations to an inline function defined in
ext4_jbd2.h, and use it in acl.c and xattr.c.

Also move the function declarations used in inline.c from xattr.h
(where they are non-obviously hidden, and caused problems since
ext4_jbd2.h needs to use the function ext4_has_inline_data), and move
them to ext4.h.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NTao Ma <boyu.mt@taobao.com>
Reviewed-by: NJan Kara <jack@suse.cz>

95eaefbd

ext4: fix the number of credits needed for ext4_unlink() and ext4_rmdir() · 64044abf

由 Theodore Ts'o 提交于 2月 09, 2013

The ext4_unlink() and ext4_rmdir() don't actually release the blocks
associated with the file/directory.  This gets done in a separate jbd2
handle called via ext4_evict_inode().  Thus, we don't need to reserve
lots of journal credits for the truncate.

Note that using too many journal credits is non-optimal because it can
leading to the journal transmit getting closed too early, before it is
strictly necessary.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

64044abf

ext4: fix the number of credits needed for ext4_ext_migrate() · 4b217630

由 Theodore Ts'o 提交于 2月 09, 2013

The migration ioctl creates a temporary inode.  Since this inode is
never linked to a directory, we don't need to reserve journal credits
required for modifying the directory.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

4b217630

09 2月, 2013 6 次提交

ext4: start handle at the last possible moment in ext4_rmdir() · 8dcfaad2

由 Theodore Ts'o 提交于 2月 09, 2013

Don't start the jbd2 transaction handle until after the directory
entry has been found, to minimize the amount of time that a handle is
held active.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

8dcfaad2

ext4: start handle at the last possible moment in ext4_unlink() · 931b6864

由 Theodore Ts'o 提交于 2月 09, 2013

Don't start the jbd2 transaction handle until after the directory
entry has been found, to minimize the amount of time that a handle is
held active.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

931b6864

ext4: grab page before starting transaction handle in write_begin() · 47564bfb

由 Theodore Ts'o 提交于 2月 09, 2013

The grab_cache_page_write_begin() function can potentially sleep for a
long time, since it may need to do memory allocation which can block
if the system is under significant memory pressure, and because it may
be blocked on page writeback.  If it does take a long time to grab the
page, it's better that we not hold an active jbd2 handle.

So grab a handle on the page first, and _then_ start the transaction
handle.

This commit fixes the following long transaction handle hold time:

postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32
   tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
   dirtied_blocks 0
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

47564bfb

ext4: pass context information to jbd2__journal_start() · 9924a92a

由 Theodore Ts'o 提交于 2月 08, 2013

So we can better understand what bits of ext4 are responsible for
long-running jbd2 handles, use jbd2__journal_start() so we can pass
context information for logging purposes.

The recommended way for finding the longer-running handles is:

   T=/sys/kernel/debug/tracing
   EVENT=$T/events/jbd2/jbd2_handle_stats
   echo "interval > 5" > $EVENT/filter
   echo 1 > $EVENT/enable

   ./run-my-fs-benchmark

   cat $T/trace > /tmp/problem-handles

This will list handles that were active for longer than 20ms.  Having
longer-running handles is bad, because a commit started at the wrong
time could stall for those 20+ milliseconds, which could delay an
fsync() or an O_SYNC operation.  Here is an example line from the
trace file describing a handle which lived on for 311 jiffies, or over
1.2 seconds:

postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
   tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
   dirtied_blocks 0
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9924a92a

ext4: move the jbd2 wrapper functions out of super.c · 722887dd

由 Theodore Ts'o 提交于 2月 08, 2013

Move the jbd2 wrapper functions which start and stop handles out of
super.c, where they don't really logically belong, and into
ext4_jbd2.c.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

722887dd

jbd2: add tracepoints which provide per-handle statistics · 343d9c28

由 Theodore Ts'o 提交于 2月 08, 2013

Handles which stay open a long time are problematic when it comes time
to close down a transaction so it can be committed. These tracepoints
will help us determine which ones are the problematic ones, and to
validate whether changes makes things better or worse.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

343d9c28

07 2月, 2013 2 次提交

jbd2: revert "jbd2: add COW fields to struct jbd2_journal_handle" · 078d5039

由 Theodore Ts'o 提交于 2月 07, 2013

This reverts commit 93737456.

The cow-snapshots effort is no longer active, so remove these extra
fields to shrink down the handle structure.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

078d5039

jbd2: track request delay statistics · 9fff24aa

由 Theodore Ts'o 提交于 2月 06, 2013

Track the delay between when we first request that the commit begin
and when it actually begins, so we can see how much of a gap exists.
In theory, this should just be the remaining scheduling quantuum of
the thread which requested the commit (assuming it was not a
synchronous operation which triggered the commit request) plus
scheduling overhead; however, it's possible that real time processes
might get in the way of letting the kjournald thread from executing.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9fff24aa

05 2月, 2013 1 次提交

ext4: optimize mballoc for large allocations · 40ae3487

由 Theodore Ts'o 提交于 2月 04, 2013

The ext4 block allocator only maintains buddy bitmaps for chunks which
are less than or equal to one quarter of a block group.  That is, for
a file aystem with a 1k blocksize, and where the number of blocks in a
block group is 8192 blocks, the largest chunk size tracked by buddy
bitmaps is 2048 blocks.

For a file system with a 4k blocksize, and where the number of blocks
in a block group is 32768 blocks, the largest chunk size tracked by
buddy bitmaps is 8192 blocks.

To work around this code, mballoc.c before this commit would truncate
allocation requests to the number of blocks in a block group minus 10.
Why 10?  Aside from being a completely arbitrary number, it avoids
block allocation to be a power of two larger than 25% of the block
group.  If you try to explicitly fallocate 50% of the block group
size, this will demonstrate the problem; the block allocation code
will scan the all of the blocks in the file system with cr==0 (since
the request is for a natural power of two), but then completely fail
for all blocks groups, since the buddy bitmaps don't track chunk sizes
of 50% of the block group.

To fix this, in these we use ext4_mb_complex_scan_group() instead of
ext4_mb_simple_scan_group().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger@dilger.ca>

40ae3487

03 2月, 2013 2 次提交

ext4: check incompatible mount options while mounting ext2/3 · 8dc0aa8c

由 Theodore Ts'o 提交于 2月 02, 2013

Check for incompatible mount options when using the ext4 file system
driver to mount ext2 or ext3 file systems.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8dc0aa8c

ext4: print error when argument of inode_readahead_blk is invalid · e33e60ea

由 Jan Kara 提交于 2月 02, 2013

If argument of inode_readahead_blk is too big, we just bail out
without printing any error. Fix this since it could confuse users.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e33e60ea

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功