提交 · 1f7bebb9e911d870fa8f997ddff838e82b5715ea · gsplhtlxg / clone-Linux

11 9月, 2009 3 次提交

ext4: Always set dx_node's fake_dirent explicitly. · 1f7bebb9

由 Andreas Schlick 提交于 9月 10, 2009

When ext4_dx_add_entry() has to split an index node, it has to ensure that
name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck
won't recognise it as an intermediate htree node and consider the htree to
be corrupted.
Signed-off-by: NAndreas Schlick <schlick@lavabit.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1f7bebb9

ext4: Fix async commit mode to be safe by using a barrier · 0e3d2a63

由 Theodore Ts'o 提交于 9月 11, 2009

Previously the journal_async_commit mount option was equivalent to
using barrier=0 (and just as unsafe).  This patch fixes it so that we
eliminate the barrier before the commit block (by not using ordered
mode), and explicitly issuing an empty barrier bio after writing the
commit block.  Because of the journal checksum, it is safe to do this;
if the journal blocks are not all written before a power failure, the
checksum in the commit block will prevent the last transaction from
being replayed.

Using the fs_mark benchmark, using journal_async_commit shows a 50%
improvement:

FSUse%        Count         Size    Files/sec     App Overhead
     8         1000        10240         30.5            28242

vs.

FSUse%        Count         Size    Files/sec     App Overhead
     8         1000        10240         45.8            28620
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0e3d2a63

ext4: Don't update superblock write time when filesystem is read-only · 71290b36

由 Theodore Ts'o 提交于 9月 10, 2009

This avoids updating the superblock write time when we are mounting
the root file system read/only but we need to replay the journal; at
that point, for people who are east of GMT and who make their clock
tick in localtime for Windows bug-for-bug compatibility, and this will
cause e2fsck to complain and force a full file system check.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

71290b36

10 9月, 2009 4 次提交

ext4: Clarify the locking details in mballoc · 08c3a813

由 Aneesh Kumar K.V 提交于 9月 09, 2009

We don't need to take the alloc_sem lock when we are adding new
groups, since mballoc won't see the new group added until we bump
sbi->s_groups_count.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

08c3a813

ext4: check for need init flag in ext4_mb_load_buddy · f41c0750

由 Aneesh Kumar K.V 提交于 9月 09, 2009

We should check for need init flag with the group's alloc_sem held, to
make sure while we are loading the buddy cache and holding a reference
to it, a file system resize can't add new blocks to same group.

The patch also drops the need init flag check in
ext4_mb_regular_allocator() because doing the check without holding
alloc_sem is racy.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

f41c0750

ext4: move ext4_mb_init_group() function earlier in the mballoc.c · b6a758ec

由 Aneesh Kumar K.V 提交于 9月 09, 2009

This moves the function around so that it can be called from
ext4_mb_load_buddy().
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b6a758ec

ext4: Make non-journal fsync work properly · 91ac6f43

由 Frank Mayhar 提交于 9月 09, 2009

Teach ext4_write_inode() and ext4_do_update_inode() about non-journal
mode:  If we're not using a journal, ext4_write_inode() now calls
ext4_do_update_inode() (after getting the iloc via ext4_get_inode_loc())
with a new "do_sync" parameter.  If that parameter is nonzero _and_ we're
not using a journal, ext4_do_update_inode() calls sync_dirty_buffer()
instead of ext4_handle_dirty_metadata().

This problem was found in power-fail testing, checking the amount of
loss of files and blocks after a power failure when using fsync() and
when not using fsync().  It turned out that using fsync() was actually
worse than not doing so, possibly because it increased the likelihood
that the inodes would remain unflushed and would therefore be lost at
the power failure.
Signed-off-by: NFrank Mayhar <fmayhar@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

91ac6f43

13 9月, 2009 1 次提交

ext4: Assure that metadata blocks are written during fsync in no journal mode · fe188c0e

由 Theodore Ts'o 提交于 9月 12, 2009

When there is no journal present, we must attach buffer heads
associated with extent tree and indirect blocks to the inode's
mapping->private_list via mark_buffer_dirty_inode() so that
ext4_sync_file() --- which is called to service fsync() and
fdatasync() system calls --- can write out the inode's metadata blocks
by calling sync_mapping_buffers().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fe188c0e

10 9月, 2009 1 次提交

ext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}() · c7acb4c1

由 Theodore Ts'o 提交于 9月 09, 2009

When ext4 is using a journal, a metadata block which is deallocated
must be passed into the journal layer so it can be dropped from the
current transaction and/or revoked. This is done by calling the
functions ext4_journal_forget() and ext4_journal_revoke(), which call
jbd2_journal_forget(), and jbd2_journal_revoke(), respectively.

Since the jbd2_journal_forget() and jbd2_journal_revoke() call
bforget(), if ext4 is not using a journal, ext4_journal_forget() and
ext4_journal_revoke() must call bforget() to avoid a dirty metadata
block overwriting a block after it has been reallocated and reused for
another inode's data block.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c7acb4c1

08 9月, 2009 1 次提交

ext4: print more sysadmin-friendly message in check_block_validity() · 80e42468

由 Theodore Ts'o 提交于 9月 08, 2009

Drop the WARN_ON(1), as he stack trace is not appropriate, since it is
triggered by file system corruption, and it misleads users into
thinking there is a kernel bug.  In addition, change the message
displayed by ext4_error() to make it clear that this is a file system
corruption problem.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

80e42468

10 9月, 2009 1 次提交

ext4: Take page lock before looking at attached buffer_heads flags · a827eaff

由 Aneesh Kumar K.V 提交于 9月 09, 2009

In order to check whether the buffer_heads are mapped we need to hold
page lock. Otherwise a reclaim can cleanup the attached buffer_heads.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a827eaff

06 9月, 2009 3 次提交

ext4: Fix small typo for move_extent_per_page() · 44fc48f7

由 Akira Fujita 提交于 9月 05, 2009

This function means moving extents every page, so change its name from
move_exgtent_par_page().
Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.co.jp>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

44fc48f7

ext4: Return exchanged blocks count to user space in failure · 8d666913

由 Akira Fujita 提交于 9月 05, 2009

Return exchanged blocks count (moved_len) to user space,
if ext4_move_extents() failed on the way.
Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8d666913

ext4: Remove unneeded BUG_ON() in ext4_move_extents() · daea696d

由 Akira Fujita 提交于 9月 05, 2009

The ext4_move_extents() functions checks with BUG_ON() whether the
exchanged blocks count accords with request blocks count.  But, if the
target range (orig_start + len) includes sparse block(s), 'moved_len'
(exchanged blocks count) does not agree with 'len' (request blocks
count), since sparse block is not counted in 'moved_len'.  This causes
us to hit the BUG_ON(), even though the function succeeded.
Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

daea696d

17 9月, 2009 1 次提交

ext4: Fix wrong comparisons in mext_check_arguments() · 70d5d3dc

由 Akira Fujita 提交于 9月 16, 2009

The mext_check_arguments() function in move_extents.c has wrong
comparisons.  orig_start which is passed from user-space is block
unit, but i_size of inode is byte unit, therefore the checks do not
work fine.  This mis-check leads to the overflow of 'len' and then
hits BUG_ON() in ext4_move_extents().  The patch fixes this issue.
Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.com>
Reviewed-by: NGreg Freemyer <greg.freemyer@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

70d5d3dc

06 9月, 2009 2 次提交

ext4: fix cache flush in ext4_sync_file · 5f3481e9

由 Christoph Hellwig 提交于 9月 05, 2009

We need to flush the write cache unconditionally in ->fsync, otherwise
writes into already allocated blocks can get lost.  Writes into fully
allocated files are very common when using disk images for
virtualization, and without this fix can easily lose data after
an fdatasync, which is the typical implementation for a cache flush on
the virtual drive.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5f3481e9

ext4: Remove journal_checksum mount option and enable it by default · d0646f7b

由 Theodore Ts'o 提交于 9月 05, 2009

There's no real cost for the journal checksum feature, and we should
make sure it is enabled all the time.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d0646f7b

17 9月, 2009 1 次提交

ext4: fix tracepoint format string warnings · a3710fd1

由 Theodore Ts'o 提交于 9月 17, 2009

Unlike on some other architectures ino_t is an unsigned int on s390.
So add an explicit cast to avoid lots of compile warnings:

In file included from include/trace/ftrace.h:285,
from include/trace/define_trace.h:61,
from include/trace/events/ext4.h:711,
from fs/ext4/super.c:50:
include/trace/events/ext4.h: In function 'ftrace_raw_output_ext4_free_inode':
include/trace/events/ext4.h:12: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a3710fd1

05 9月, 2009 1 次提交
- T
  ext4: Declare seq_operations and file_operations structures as const · 7f1346a9
  由 Tobias Klauser 提交于 9月 05, 2009
```
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
  7f1346a9
01 9月, 2009 2 次提交

ext4: Add new tracepoint: trace_ext4_da_write_pages() · b3a3ca8c

由 Theodore Ts'o 提交于 8月 31, 2009

Add a new tracepoint which shows the pages that will be written using
write_cache_pages() by ext4_da_writepages().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b3a3ca8c

ext4: Restore wbc->range_start in ext4_da_writepages() · de89de6e

由 Theodore Ts'o 提交于 8月 31, 2009

To solve a lock inversion problem, we implement part of the
range_cyclic algorithm in ext4_da_writepages().  (See commit 2acf2c26
for more details.)

As part of that change wbc->range_start was modified by ext4's
writepages function, which causes its callers to get confused since
they aren't expecting the filesystem to modify it.  The simplest fix
is to save and restore wbc->range_start in ext4_da_writepages.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

de89de6e

17 9月, 2009 1 次提交
- T
  ext4: Fix spelling typo in the trace format for trace_ext4_da_writepages() · 98a56ab3
  由 Theodore Ts'o 提交于 9月 17, 2009
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
  98a56ab3
30 8月, 2009 1 次提交

ext4: Limit number of links that can be created by ext4_link() · b05ab1dc

由 Theodore Ts'o 提交于 8月 29, 2009

In ext4_link we need to check using EXT4_LINK_MAX, and not
EXT4_DIR_LINK_MAX(), since ext4_link() is creating hard links of
regular files, and not directories.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b05ab1dc

29 8月, 2009 1 次提交

ext4: Allow rename to create more than EXT4_LINK_MAX subdirectories · 2c94eb86

由 Aneesh Kumar K.V 提交于 8月 28, 2009

Use EXT4_DIR_LINK_MAX so that rename() can move a directory into new
parent directory without running into the EXT4_LINK_MAX limit.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2c94eb86

28 8月, 2009 1 次提交

ext4: fix extent sanity checking code with AGGRESSIVE_TEST · 55ad63bf

由 Theodore Ts'o 提交于 8月 28, 2009

The extents sanity-checking code depends on the ext4_ext_space_*()
functions returning the maximum alloable size for eh_max; however,
when the debugging #ifdef AGGRESSIVE_TEST is enabled to test the
extent tree handling code, this prevents a normally created ext4
filesystem from being mounted with the errors:

Aug 26 15:43:50 bsd086 kernel: [ 96.070277] EXT4-fs error (device sda8): ext4_ext_check_inode: bad header/extent in inode #8: too large eh_max - magic f30a, entries 1, max 4(3), depth 0(0)
Aug 26 15:43:50 bsd086 kernel: [ 96.070526] EXT4-fs (sda8): no journal found

Bug reported by Akira Fujita.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

55ad63bf

26 8月, 2009 3 次提交

ext4: use ext4_grpblk_t more extensively · a36b4498

由 Eric Sandeen 提交于 8月 25, 2009

unsigned  short is potentially too small to track blocks within
a group; today it is safe due to restrictions in e2fsprogs but
we have _lo / _hi bits for group blocks with the intent to go
up to 32 bits, so clean this up now.

There are many more places where we use unsigned/int/unsigned int
to contain a group block but this should at least fix all the
short types.

I added a few comments to the struct ext4_group_info definition
as well.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a36b4498

ext4: use variables not types in sizeofs() for allocations · 1927805e

由 Eric Sandeen 提交于 8月 25, 2009

Precursor to changing some types; to keep things in sync, it 
seems better to allocate/memset based on the size of the 
variables we are using rather than on some disconnected 
basic type like "unsigned short"
Signed-off-by: NEric Sandeen <sandeen@redhat.com>

1927805e

ext4: Add missing unlock_new_inode() call in extent migration code · a8526e84

由 Aneesh Kumar K.V 提交于 8月 25, 2009

We need to unlock the new inode before iput.  This patch fixes the
following warning when calling chattr +e to migrate a file to use
extents.  It also fixes problems in when e4defrag attempts to
defragment an inode.

[  470.400044] ------------[ cut here ]------------
[  470.400065] WARNING: at fs/inode.c:1210 generic_delete_inode+0x65/0x16a()
[  470.400072] Hardware name: N/A
.....
...
[  470.400353] Pid: 4451, comm: chattr Not tainted 2.6.31-rc7-red-debug #4
[  470.400359] Call Trace:
[  470.400372]  [<ffffffff81037771>] warn_slowpath_common+0x77/0x8f
[  470.400385]  [<ffffffff81037798>] warn_slowpath_null+0xf/0x11
[  470.400395]  [<ffffffff810b7f28>] generic_delete_inode+0x65/0x16a
[  470.400405]  [<ffffffff810b8044>] generic_drop_inode+0x17/0x1bd
[  470.400413]  [<ffffffff810b7083>] iput+0x61/0x65
[  470.400455]  [<ffffffffa003b229>] ext4_ext_migrate+0x5eb/0x66a [ext4]
[  470.400492]  [<ffffffffa002b1f8>] ext4_ioctl+0x340/0x756 [ext4]
[  470.400507]  [<ffffffff810b1a91>] vfs_ioctl+0x1d/0x82
[  470.400517]  [<ffffffff810b1ff0>] do_vfs_ioctl+0x483/0x4c9
[  470.400527]  [<ffffffff81059c30>] ? trace_hardirqs_on+0xd/0xf
[  470.400537]  [<ffffffff810b2087>] sys_ioctl+0x51/0x74
[  470.400549]  [<ffffffff8100ba6b>] system_call_fastpath+0x16/0x1b
[  470.400557] ---[ end trace ab85723542352dac ]---
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a8526e84

18 8月, 2009 7 次提交

ext4: Add feature set check helper for mount & remount paths · a13fb1a4

由 Eric Sandeen 提交于 8月 18, 2009

A user reported that although his root ext4 filesystem was mounting
fine, other filesystems would not mount, with the:

"Filesystem with huge files cannot be mounted RDWR without CONFIG_LBDAF"

error on his 32-bit box built without CONFIG_LBDAF.  This is because
the test at mount time for this situation was not being re-checked
on remount, and the normal boot process makes an ro->rw transition,
so this was being missed.

Refactor to make a common helper function to test the filesystem
features against the type of mount request (RO vs. RW) so that we 
stay consistent.

Addresses Red-Hat-Bugzilla: #517650
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a13fb1a4

simplify some logic in ext4_mb_normalize_request · 38877f4e

由 Eric Sandeen 提交于 8月 17, 2009

While reading through some of the mballoc code it seems that a couple
spots in the size normalization function could be streamlined.

The test for non-overlapping PAs can be or'd for the start & end
conditions, and the tests for adjacent PAs can be else-if'd - 
it's essentially independently testing:

	if (A + B <= C)
		...
	if (A > C)
		...

These cannot both be true so it seems like the else-if might
be slightly more efficient and/or informative.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

38877f4e

ext4: open-code ext4_mb_update_group_info · 0373130d

由 Eric Sandeen 提交于 8月 17, 2009

ext4_mb_update_group_info is only called in one place, and it's
extremely simple.  There's no reason to have it in a separate function
in a separate file as far as I can tell, it just obfuscates what's
really going on.

Perhaps it was intended to keep the grp->bb_* manipulation local to
mballoc.c but we're already accessing other grp-> fields in balloc.c
directly so this seems ok.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0373130d

ext4: reject too-large filesystems on 32-bit kernels · bf43d84b

由 Eric Sandeen 提交于 8月 17, 2009

ext4 will happily mount a > 16T filesystem on a 32-bit box, but
this is not safe; writes to the block device will wrap past 16T
and the page cache can't index past 16T (232 index * 4k pages).

Adding another test to the existing "too many sectors" test
should do the trick.

Add a comment, a relevant return value, and fix the reference
to the CONFIG_LBD(AF) option as well.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bf43d84b

jbd2: bitfields should be unsigned · 0ccff1a4

由 H Hartley Sweeten 提交于 8月 17, 2009

This fixes sparse noise:
  error: dubious one-bit signed bitfield
Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@ucw.cz>

0ccff1a4

ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks() · 487caeef

由 Jan Kara 提交于 8月 17, 2009

During truncate we are sometimes forced to start a new transaction as
the amount of blocks to be journaled is both quite large and hard to
predict. So far we restarted a transaction while holding i_data_sem
and that violates lock ordering because i_data_sem ranks below a
transaction start (and it can lead to a real deadlock with
ext4_get_blocks() mapping blocks in some page while having a
transaction open).

We fix the problem by dropping the i_data_sem before restarting the
transaction and acquire it afterwards. It's slightly subtle that this
works:

1) By the time ext4_truncate() is called, all the page cache for the
truncated part of the file is dropped so get_block() should not be
called on it (we only have to invalidate extent cache after we
reacquire i_data_sem because some extent from not-truncated part could
extend also into the part we are going to truncate).

2) Writes, migrate or defrag hold i_mutex so they are stopped for all
the time of the truncate.

This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

487caeef

jbd2: Annotate transaction start also for jbd2_journal_restart() · 9599b0e5

由 Jan Kara 提交于 8月 17, 2009

lockdep annotation for a transaction start has been at the end of
jbd2_journal_start(). But a transaction is also started from
jbd2_journal_restart(). Move the lockdep annotation to start_this_handle()
which covers both cases.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9599b0e5

19 9月, 2009 1 次提交

ext4: Show unwritten extent flag in ext4_ext_show_leaf() · 553f9008

由 Mingming 提交于 9月 18, 2009

ext4_ext_show_leaf() will display the leaf extents when extent
debugging is enabled.

Printing out the unwritten bit is useful for debugging unwritten
extent, allow us to see the unwritten extents vs written extents,
after the unwritten extents are splitted or converted.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

553f9008

01 9月, 2009 1 次提交

ext4: Compile warning fix when EXT_DEBUG enabled · 84fe3bef

由 Mingming 提交于 9月 01, 2009

When EXT_DEBUG is enabled I received the following compile warning on
PPC64:

  CC [M]  fs/ext4/inode.o
  CC [M]  fs/ext4/extents.o
fs/ext4/extents.c: In function ‘ext4_ext_rm_leaf’:
fs/ext4/extents.c:2097: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘ext4_lblk_t’
fs/ext4/extents.c: In function ‘ext4_ext_get_blocks’:
fs/ext4/extents.c:2789: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’
fs/ext4/extents.c:2852: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘ext4_lblk_t’
fs/ext4/extents.c:2953: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 4 has type ‘unsigned int’
  CC [M]  fs/ext4/migrate.o

The patch fixes compile warning.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

Index: linux-2.6.31-rc4/fs/ext4/extents.c
===================================================================

84fe3bef

19 9月, 2009 1 次提交

ext4: Avoid group preallocation for closed files · 50797481

由 Theodore Ts'o 提交于 9月 18, 2009

Currently the group preallocation code tries to find a large (512)
free block from which to do per-cpu group allocation for small files.
The problem with this scheme is that it leaves the filesystem horribly
fragmented. In the worst case, if the filesystem is unmounted and
remounted (after a system shutdown, for example) we forget the fact
that wee were using a particular (now-partially filled) 512 block
extent. So the next time we try to allocate space for a small file,
we will find *another* completely free 512 block chunk to allocate
small files. Given that there are 32,768 blocks in a block group,
after 64 iterations of "mount, write one 4k file in a directory,
unmount", the block group will have 64 files, each separated by 511
blocks, and the block group will no longer have any free 512
completely free chunks of blocks for group preallocation space.

So if we try to allocate blocks for a file that has been closed, such
that we know the final size of the file, and the filesystem is not
busy, avoid using group preallocation.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

50797481

10 8月, 2009 2 次提交

ext4: Fix bugs in mballoc's stream allocation mode · 4ba74d00

由 Theodore Ts'o 提交于 8月 09, 2009

The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all
screwed up.  These fields were getting unconditionally all the time,
set even when stream allocation had not taken place, and if they were
being used when the file was smaller than s_mb_stream_request, which
is when the allocation should _not_ be doing stream allocation.

Fix this by determining whether or not we stream allocation should
take place once, in ext4_mb_group_or_file(), and setting a flag which
gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found().
This simplifies the code and assures that we are consistently using
(or not using) the stream allocation logic.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4ba74d00

ext4: Display the mballoc flags in mb_history in hex instead of decimal · 0ef90db9

由 Theodore Ts'o 提交于 8月 09, 2009

Displaying the flags in base 16 makes it easier to see which flags
have been set.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0ef90db9