提交 · 6d19c42b7cf81c39632b6d4dbc514e8449bcd346 · OpenHarmony / kernel_linux

06 3月, 2010 1 次提交

pass writeback_control to ->write_inode · a9185b41

由 Christoph Hellwig 提交于 3月 05, 2010

This gives the filesystem more information about the writeback that
is happening.  Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a9185b41

04 3月, 2010 1 次提交

ext4: consolidate in_range() definitions · 731eb1a0

由 Akinobu Mita 提交于 3月 03, 2010

There are duplicate macro definitions of in_range() in mballoc.h and
balloc.c.  This consolidates these two definitions into ext4.h, and
changes extents.c to use in_range() as well.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger@sun.com>

731eb1a0

03 3月, 2010 1 次提交

ext4: Convert BUG_ON checks to use ext4_error() instead · 273df556

由 Frank Mayhar 提交于 3月 02, 2010

Convert a bunch of BUG_ONs to emit a ext4_error() message and return
EIO.  This is a first pass and most notably does _not_ cover
mballoc.c, which is a morass of void functions.
Signed-off-by: NFrank Mayhar <fmayhar@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

273df556

05 3月, 2010 1 次提交

ext4: use ext4_get_block_write in buffer write · 744692dc

由 Jiaying Zhang 提交于 3月 04, 2010

Allocate uninitialized extent before ext4 buffer write and
convert the extent to initialized after io completes.
The purpose is to make sure an extent can only be marked
initialized after it has been written with new data so
we can safely drop the i_mutex lock in ext4 DIO read without
exposing stale data. This helps to improve multi-thread DIO
read performance on high-speed disks.

Skip the nobh and data=journal mount cases to make things simple for now.
Signed-off-by: NJiaying Zhang <jiayingz@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

744692dc

03 3月, 2010 1 次提交

ext4: mechanical rename some of the direct I/O get_block's identifiers · c7064ef1

由 Jiaying Zhang 提交于 3月 02, 2010

This commit renames some of the direct I/O's block allocation flags,
variables, and functions introduced in Mingming's "Direct IO for holes
and fallocate" patches so that they can be used by ext4's buffered
write path as well. Also changed the related function comments
accordingly to cover both direct write and buffered write cases.
Signed-off-by: NJiaying Zhang <jiayingz@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c7064ef1

24 2月, 2010 1 次提交

ext4: Add flag to files with blocks intentionally past EOF · c8d46e41

由 Jiaying Zhang 提交于 2月 24, 2010

fallocate() may potentially instantiate blocks past EOF, depending
on the flags used when it is called.

e2fsck currently has a test for blocks past i_size, and it
sometimes trips up - noticeably on xfstests 013 which runs fsstress.

This patch from Jiayang does fix it up - it (along with
e2fsprogs updates and other patches recently from Aneesh) has
survived many fsstress runs in a row.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJiaying Zhang <jiayingz@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c8d46e41

17 2月, 2010 1 次提交

percpu: add __percpu sparse annotations to fs · 003cb608

由 Tejun Heo 提交于 2月 02, 2010

Add __percpu sparse annotations to fs.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors.  This patch doesn't affect normal builds.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>

003cb608

16 2月, 2010 1 次提交

ext4: move __func__ into a macro for ext4_warning, ext4_error · 12062ddd

由 Eric Sandeen 提交于 2月 15, 2010

Just a pet peeve of mine; we had a mishash of calls with either __func__
or "function_name" and the latter tends to get out of sync.

I think it's easier to just hide the __func__ in a macro, and it'll
be consistent from then on.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

12062ddd

25 1月, 2010 2 次提交

T
ext4: Reserve INCOMPAT_EA_INODE and INCOMPAT_DIRDATA feature codepoints · f710b4b9
由 Theodore Ts'o 提交于 1月 25, 2010
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
f710b4b9

ext4: Use bitops to read/modify EXT4_I(inode)->i_state · 19f5fb7a

由 Theodore Ts'o 提交于 1月 24, 2010

At several places we modify EXT4_I(inode)->i_state without holding
i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage,
ext4_do_update_inode, ...). These modifications are racy and we can
lose updates to i_state. So convert handling of i_state to use bitops
which are atomic.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

19f5fb7a

15 1月, 2010 1 次提交

ext4: Drop EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE flag · 1296cc85

由 Aneesh Kumar K.V 提交于 1月 15, 2010

We should update reserve space if it is delalloc buffer
and that is indicated by EXT4_GET_BLOCKS_DELALLOC_RESERVE flag.
So use EXT4_GET_BLOCKS_DELALLOC_RESERVE in place of
EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

1296cc85

25 1月, 2010 1 次提交

ext4: Fix quota accounting error with fallocate · 5f634d06

由 Aneesh Kumar K.V 提交于 1月 25, 2010

When we fallocate a region of the file which we had recently written,
and which is still in the page cache marked as delayed allocated blocks
we need to make sure we don't do the quota update on writepage path.
This is because the needed quota updated would have already be done
by fallocate.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

5f634d06

01 1月, 2010 1 次提交

ext4: Calculate metadata requirements more accurately · 9d0be502

由 Theodore Ts'o 提交于 1月 01, 2010

In the past, ext4_calc_metadata_amount(), and its sub-functions
ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount()
badly over-estimated the number of metadata blocks that might be
required for delayed allocation blocks. This didn't matter as much
when functions which managed the reserved metadata blocks were more
aggressive about dropping reserved metadata blocks as delayed
allocation blocks were written, but unfortunately they were too
aggressive. This was fixed in commit 0637c6f4, but as a result the
over-estimation by ext4_calc_metadata_amount() would lead to reserving
2-3 times the number of pending delayed allocation blocks as
potentially required metadata blocks. So if there are 1 megabytes of
blocks which have been not yet been allocation, up to 3 megabytes of
space would get reserved out of the user's quota and from the file
system free space pool until all of the inode's data blocks have been
allocated.

This commit addresses this problem by much more accurately estimating
the number of metadata blocks that will be required. It will still
somewhat over-estimate the number of blocks needed, since it must make
a worst case estimate not knowing which physical blocks will be
needed, but it is much more accurate than before.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9d0be502

23 12月, 2009 1 次提交

ext4: Convert to generic reserved quota's space management. · a9e7f447

由 Dmitry Monakhov 提交于 12月 14, 2009

This patch also fixes write vs chown race condition.
Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJan Kara <jack@suse.cz>

a9e7f447

23 1月, 2010 1 次提交

ext4: Add block validity check when truncating indirect block mapped inodes · 1f2acb60

由 Theodore Ts'o 提交于 1月 22, 2010

Add checks to ext4_free_branches() to make sure a block number found
in an indirect block are valid before trying to free it.  If a bad
block number is found, stop freeing the indirect block immediately,
since the file system is corrupt and we will need to run fsck anyway.
This also avoids spamming the logs, and specifically avoids
driver-level "attempt to access beyond end of device" errors obscure
what is really going on.

If you get *really*, *really*, *really* unlucky, without this patch, a
supposed indirect block containing garbage might contain a reference
to a primary block group descriptor, in which case
ext4_free_branches() could end up zero'ing out a block group
descriptor block, and if then one of the block bitmaps for a block
group described by that bg descriptor block is not in memory, and is
read in by ext4_read_block_bitmap().  This function calls
ext4_valid_block_bitmap(), which assumes that bg_inode_table() was
validated at mount time and hasn't been modified since.  Since this
assumption is no longer valid, it's possible for the value
(ext4_inode_table(sb, desc) - group_first_block) to go negative, which
will cause ext4_find_next_zero_bit() to trigger a kernel GPF.

Addresses-Google-Bug: #2220436
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1f2acb60

05 2月, 2010 1 次提交

ext4: fix async i/o writes beyond 4GB to a sparse file · a1de02dc

由 Eric Sandeen 提交于 2月 04, 2010

The "offset" member in ext4_io_end holds bytes, not blocks, so
ext4_lblk_t is wrong - and too small (u32).

This caused the async i/o writes to sparse files beyond 4GB to fail
when they wrapped around to 0.

Also fix up the type of arguments to ext4_convert_unwritten_extents(),
it gets ssize_t from ext4_end_aio_dio_nolock() and
ext4_ext_direct_IO().
Reported-by: NGiel de Nijs <giel@vectorwise.com>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>

a1de02dc

09 12月, 2009 1 次提交

ext4: Wait for proper transaction commit on fsync · b436b9be

由 Jan Kara 提交于 12月 08, 2009

We cannot rely on buffer dirty bits during fsync because pdflush can come
before fsync is called and clear dirty bits without forcing a transaction
commit. What we do is that we track which transaction has last changed
the inode and which transaction last changed allocation and force it to
disk on fsync.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b436b9be

23 11月, 2009 1 次提交

ext4: call ext4_forget() from ext4_free_blocks() · e6362609

由 Theodore Ts'o 提交于 11月 23, 2009

Add the facility for ext4_forget() to be called from
ext4_free_blocks().  This simplifies the code in a large number of
places, and centralizes most of the work of calling ext4_forget() into
a single place.

Also fix a bug in the extents migration code; it wasn't calling
ext4_forget() when releasing the indirect blocks during the
conversion.  As a result, if the system cashed during or shortly after
the extents migration, and the released indirect blocks get reused as
data blocks, the journal replay would corrupt the data blocks.  With
this new patch, fixing this bug was as simple as adding the
EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

e6362609

22 11月, 2009 1 次提交

ext4: fold ext4_free_blocks() and ext4_mb_free_blocks() · 44338711

由 Theodore Ts'o 提交于 11月 22, 2009

ext4_mb_free_blocks() is only called by ext4_free_blocks(), and the
latter function doesn't really do much. So merge the two functions
together, such that ext4_free_blocks() is now found in
fs/ext4/mballoc.c. This saves about 200 bytes of compiled text space.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

44338711

23 11月, 2009 1 次提交

ext4: move ext4_forget() to ext4_jbd2.c · d6797d14

由 Theodore Ts'o 提交于 11月 22, 2009

The ext4_forget() function better belongs in ext4_jbd2.c.  This will
allow us to do some cleanup of the ext4_journal_revoke() and
ext4_journal_forget() functions, as well as giving us better error
reporting since we can report the caller of ext4_forget() when things
go wrong.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d6797d14

20 11月, 2009 1 次提交

ext4: make trim/discard optional (and off by default) · 5328e635

由 Eric Sandeen 提交于 11月 19, 2009

It is anticipated that when sb_issue_discard starts doing
real work on trim-capable devices, we may see issues.  Make
this mount-time optional, and default it to off until we know
that things are working out OK.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5328e635

10 11月, 2009 1 次提交

ext4: skip conversion of uninit extents after direct IO if there isn't any · 5f524950

由 Mingming 提交于 11月 10, 2009

At the end of direct I/O operation, ext4_ext_direct_IO() always called
ext4_convert_unwritten_extents(), regardless of whether there were any
unwritten extents involved in the I/O or not.

This commit adds a state flag so that ext4_ext_direct_IO() only calls
ext4_convert_unwritten_extents() when necessary.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5f524950

03 11月, 2009 1 次提交

Revert "ext4: Remove journal_checksum mount option and enable it by default" · d4da6c9c

由 Linus Torvalds 提交于 11月 02, 2009

This reverts commit d0646f7b, as
requested by Eric Sandeen.

It can basically cause an ext4 filesystem to miss recovery (and thus get
mounted with errors) if the journal checksum does not match.

Quoth Eric:

   "My hand-wavy hunch about what is happening is that we're finding a
    bad checksum on the last partially-written transaction, which is
    not surprising, but if we have a wrapped log and we're doing the
    initial scan for head/tail, and we abort scanning on that bad
    checksum, then we are essentially running an unrecovered filesystem.

    But that's hand-wavy and I need to go look at the code.

    We lived without journal checksums on by default until now, and at
    this point they're doing more harm than good, so we should revert
    the default-changing commit until we can fix it and do some good
    power-fail testing with the fixes in place."

See

	http://bugzilla.kernel.org/show_bug.cgi?id=14354

for all the gory details.
Requested-by: NEric Sandeen <sandeen@redhat.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Alexey Fisher <bug-track@fisher-privat.net>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mathias Burén <mathias.buren@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d4da6c9c

30 9月, 2009 2 次提交

ext4: Fix time encoding with extra epoch bits · c1fccc06

由 Theodore Ts'o 提交于 9月 30, 2009

"Looking at ext4.h, I think the setting of extra time fields forgets to
mask the epoch bits so the epoch part overwrites nsec part. The second
change is only for coherency (2 -> EXT4_EPOCH_BITS)."

Thanks to Damien Guibouret for pointing out this problem.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c1fccc06

ext4: Use tracepoints for mb_history trace file · 296c355c

由 Theodore Ts'o 提交于 9月 30, 2009

The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a
number of problems: it required a largish amount of memory to be
allocated for each ext4 filesystem, and the s_mb_history_lock
introduced a CPU contention problem.  

By ripping out the mb_history code and replacing it with ftrace
tracepoints, and we get more functionality: timestamps, event
filtering, the ability to correlate mballoc history with other ext4
tracepoints, etc.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

296c355c

29 9月, 2009 3 次提交

ext4: async direct IO for holes and fallocate support · 8d5d02e6

由 Mingming Cao 提交于 9月 28, 2009

For async direct IO that covers holes or fallocate, the end_io
callback function now queued the convertion work on workqueue but
don't flush the work rightaway as it might take too long to afford.

But when fsync is called after all the data is completed, user expects
the metadata also being updated before fsync returns.

Thus we need to flush the conversion work when fsync() is called.
This patch keep track of a listed of completed async direct io that
has a work queued on workqueue.  When fsync() is called, it will go
through the list and do the conversion.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

8d5d02e6

ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O · 4c0425ff

由 Mingming Cao 提交于 9月 28, 2009

Currently the DIO VFS code passes create = 0 when writing to the
middle of file.  It does this to avoid block allocation for holes, so
as not to expose stale data out when there is a parallel buffered read
(which does not hold the i_mutex lock).  Direct I/O writes into holes
falls back to buffered IO for this reason.

Since preallocated extents are treated as holes when doing a
get_block() look up (buffer is not mapped), direct IO over fallocate
also falls back to buffered IO.  Thus ext4 actually silently falls
back to buffered IO in above two cases, which is undesirable.

To fix this, this patch creates unitialized extents when a direct I/O
write into holes in sparse files, and registering an end_io callback which
converts the uninitialized extent to an initialized extent after the
I/O is completed.
Singed-Off-By: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4c0425ff

ext4: Split uninitialized extents for direct I/O · 0031462b

由 Mingming Cao 提交于 9月 28, 2009

When writing into an unitialized extent via direct I/O, and the direct
I/O doesn't exactly cover the unitialized extent, split the extent
into uninitialized and initialized extents before submitting the I/O.
This avoids needing to deal with an ENOSPC error in the end_io
callback that gets used for direct I/O.

When the IO is complete, the written extent will be marked as initialized.

Singed-Off-By: Mingming Cao <cmm@us.ibm.com> 
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0031462b

30 9月, 2009 1 次提交

ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks · 55138e0b

由 Theodore Ts'o 提交于 9月 29, 2009

Work around problems in the writeback code to force out writebacks in
larger chunks than just 4mb, which is just too small.  This also works
around limitations in the ext4 block allocator, which can't allocate
more than 2048 blocks at a time.  So we need to defeat the round-robin
characteristics of the writeback code and try to write out as many
blocks in one inode before allowing the writeback code to move on to
another inode.  We add a a new per-filesystem tunable,
max_writeback_mb_bump, which caps this to a default of 128mb per
inode.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

55138e0b

17 9月, 2009 3 次提交

ext4: replace MAX_DEFRAG_SIZE with EXT_MAX_BLOCK · 0a80e986

由 Eric Sandeen 提交于 9月 17, 2009

There's no reason to redefine the maximum allowable offset
in an extent-based file just for defrag; 
EXT_MAX_BLOCK already does this.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0a80e986

ext4: store EXT4_EXT_MIGRATE in i_state instead of i_flags · 1b9c12f4

由 Theodore Ts'o 提交于 9月 17, 2009

EXT4_EXT_MIGRATE is only intended to be used for an in-memory flag,
and the hex value assigned to it collides with FS_DIRECTIO_FL (which
is also stored in i_flags).  There's no reason for the
EXT4_EXT_MIGRATE bit to be stored in i_flags, so we switch it to use
i_state instead.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1b9c12f4

ext4: limit block allocations for indirect-block files to < 2^32 · fb0a387d

由 Eric Sandeen 提交于 9月 16, 2009

Today, the ext4 allocator will happily allocate blocks past
2^32 for indirect-block files, which results in the block
numbers getting truncated, and corruption ensues.

This patch limits such allocations to < 2^32, and adds
BUG_ONs if we do get blocks larger than that.

This should address RH Bug 519471, ext4 bitmap allocator 
must limit blocks to < 2^32

* ext4_find_goal() is modified to choose a goal < UINT_MAX,
  so that our starting point is in an acceptable range.

* ext4_xattr_block_set() is modified such that the goal block
  is < UINT_MAX, as above.

* ext4_mb_regular_allocator() is modified so that the group
  search does not continue into groups which are too high

* ext4_mb_use_preallocated() has a check that we don't use
  preallocated space which is too far out

* ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs

No attempt has been made to limit inode locations to < 2^32,
so we may wind up with blocks far from their inodes.  Doing
this much already will lead to some odd ENOSPC issues when the
"lower 32" gets full, and further restricting inodes could
make that even weirder.

For high inodes, choosing a goal of the original, % UINT_MAX,
may be a bit odd, but then we're in an odd situation anyway,
and I don't know of a better heuristic.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fb0a387d

06 9月, 2009 1 次提交

ext4: Remove journal_checksum mount option and enable it by default · d0646f7b

由 Theodore Ts'o 提交于 9月 05, 2009

There's no real cost for the journal checksum feature, and we should
make sure it is enabled all the time.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d0646f7b

01 9月, 2009 1 次提交

ext4: Add new tracepoint: trace_ext4_da_write_pages() · b3a3ca8c

由 Theodore Ts'o 提交于 8月 31, 2009

Add a new tracepoint which shows the pages that will be written using
write_cache_pages() by ext4_da_writepages().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b3a3ca8c

26 8月, 2009 1 次提交

ext4: use ext4_grpblk_t more extensively · a36b4498

由 Eric Sandeen 提交于 8月 25, 2009

unsigned  short is potentially too small to track blocks within
a group; today it is safe due to restrictions in e2fsprogs but
we have _lo / _hi bits for group blocks with the intent to go
up to 32 bits, so clean this up now.

There are many more places where we use unsigned/int/unsigned int
to contain a group block but this should at least fix all the
short types.

I added a few comments to the struct ext4_group_info definition
as well.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a36b4498

18 8月, 2009 2 次提交

ext4: open-code ext4_mb_update_group_info · 0373130d

由 Eric Sandeen 提交于 8月 17, 2009

ext4_mb_update_group_info is only called in one place, and it's
extremely simple.  There's no reason to have it in a separate function
in a separate file as far as I can tell, it just obfuscates what's
really going on.

Perhaps it was intended to keep the grp->bb_* manipulation local to
mballoc.c but we're already accessing other grp-> fields in balloc.c
directly so this seems ok.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0373130d

ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks() · 487caeef

由 Jan Kara 提交于 8月 17, 2009

During truncate we are sometimes forced to start a new transaction as
the amount of blocks to be journaled is both quite large and hard to
predict. So far we restarted a transaction while holding i_data_sem
and that violates lock ordering because i_data_sem ranks below a
transaction start (and it can lead to a real deadlock with
ext4_get_blocks() mapping blocks in some page while having a
transaction open).

We fix the problem by dropping the i_data_sem before restarting the
transaction and acquire it afterwards. It's slightly subtle that this
works:

1) By the time ext4_truncate() is called, all the page cache for the
truncated part of the file is dropped so get_block() should not be
called on it (we only have to invalidate extent cache after we
reacquire i_data_sem because some extent from not-truncated part could
extend also into the part we are going to truncate).

2) Writes, migrate or defrag hold i_mutex so they are stopped for all
the time of the truncate.

This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

487caeef

19 9月, 2009 1 次提交

ext4: Avoid group preallocation for closed files · 50797481

由 Theodore Ts'o 提交于 9月 18, 2009

Currently the group preallocation code tries to find a large (512)
free block from which to do per-cpu group allocation for small files.
The problem with this scheme is that it leaves the filesystem horribly
fragmented. In the worst case, if the filesystem is unmounted and
remounted (after a system shutdown, for example) we forget the fact
that wee were using a particular (now-partially filled) 512 block
extent. So the next time we try to allocate space for a small file,
we will find *another* completely free 512 block chunk to allocate
small files. Given that there are 32,768 blocks in a block group,
after 64 iterations of "mount, write one 4k file in a directory,
unmount", the block group will have 64 files, each separated by 511
blocks, and the block group will no longer have any free 512
completely free chunks of blocks for group preallocation space.

So if we try to allocate blocks for a file that has been closed, such
that we know the final size of the file, and the filesystem is not
busy, avoid using group preallocation.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

50797481

10 8月, 2009 2 次提交

ext4: Fix bugs in mballoc's stream allocation mode · 4ba74d00

由 Theodore Ts'o 提交于 8月 09, 2009

The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all
screwed up.  These fields were getting unconditionally all the time,
set even when stream allocation had not taken place, and if they were
being used when the file was smaller than s_mb_stream_request, which
is when the allocation should _not_ be doing stream allocation.

Fix this by determining whether or not we stream allocation should
take place once, in ext4_mb_group_or_file(), and setting a flag which
gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found().
This simplifies the code and assures that we are consistently using
(or not using) the stream allocation logic.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4ba74d00

ext4: Display the mballoc flags in mb_history in hex instead of decimal · 0ef90db9

由 Theodore Ts'o 提交于 8月 09, 2009

Displaying the flags in base 16 makes it easier to see which flags
have been set.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0ef90db9

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多