提交 · bfff68738f1cb5c93dab1114634cea02aae9e7ba · OpenHarmony / kernel_linux

28 10月, 2010 8 次提交

ext4: add support for lazy inode table initialization · bfff6873

由 Lukas Czerner 提交于 10月 27, 2010

When the lazy_itable_init extended option is passed to mke2fs, it
considerably speeds up filesystem creation because inode tables are
not zeroed out.  The fact that parts of the inode table are
uninitialized is not a problem so long as the block group descriptors,
which contain information regarding how much of the inode table has
been initialized, has not been corrupted However, if the block group
checksums are not valid, e2fsck must scan the entire inode table, and
the the old, uninitialized data could potentially cause e2fsck to
report false problems.

Hence, it is important for the inode tables to be initialized as soon
as possble.  This commit adds this feature so that mke2fs can safely
use the lazy inode table initialization feature to speed up formatting
file systems.

This is done via a new new kernel thread called ext4lazyinit, which is
created on demand and destroyed, when it is no longer needed.  There
is only one thread for all ext4 filesystems in the system. When the
first filesystem with inititable mount option is mounted, ext4lazyinit
thread is created, then the filesystem can register its request in the
request list.

This thread then walks through the list of requests picking up
scheduled requests and invoking ext4_init_inode_table(). Next schedule
time for the request is computed by multiplying the time it took to
zero out last inode table with wait multiplier, which can be set with
the (init_itable=n) mount option (default is 10).  We are doing
this so we do not take the whole I/O bandwidth. When the thread is no
longer necessary (request list is empty) it frees the appropriate
structures and exits (and can be created later later by another
filesystem).

We do not disturb regular inode allocations in any way, it just do not
care whether the inode table is, or is not zeroed. But when zeroing, we
have to skip used inodes, obviously. Also we should prevent new inode
allocations from the group, while zeroing is on the way. For that we
take write alloc_sem lock in ext4_init_inode_table() and read alloc_sem
in the ext4_claim_inode, so when we are unlucky and allocator hits the
group which is currently being zeroed, it just has to wait.

This can be suppresed using the mount option no_init_itable.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bfff6873

ext4: fix NULL pointer dereference in print_daily_error_info() · a1c6c569

由 Sergey Senozhatsky 提交于 10月 27, 2010

Fix NULL pointer dereference in print_daily_error_info, when   
called on unmounted fs (EXT4_SB(sb) returns NULL), by removing error 
reporting timer in ext4_put_super.

Google-Bug-Id: 3017663
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a1c6c569

ext4: don't hold spinlock while calling ext4_issue_discard() · 53fdcf99

由 Lukas Czerner 提交于 10月 27, 2010

We can't hold the block group spinlock because we ext4_issue_discard()
calls wait and hence can get rescheduled.

Google-Bug-Id: 3017678
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

53fdcf99

ext4: check for negative error code from sb_issue_discard · 58298709

由 Lukas Czerner 提交于 10月 27, 2010

sb_issue_discard() is returning negative error code, so check for
-EOPNOTSUPP.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

58298709

ext4: don't bump up LONG_MAX nr_to_write by a factor of 8 · b443e733

由 Eric Sandeen 提交于 10月 27, 2010

I'm uneasy with lots of stuff going on in ext4_da_writepages(),
but bumping nr_to_write from LLONG_MAX to -8 clearly isn't
making anything better, so avoid the multiplier in that case.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b443e733

ext4: stop looping in ext4_num_dirty_pages when max_pages reached · 659c6009

由 Eric Sandeen 提交于 10月 27, 2010

Today we simply break out of the inner loop when we have accumulated
max_pages; this keeps scanning forwad and doing pagevec_lookup_tag()
in the while (!done) loop, this does potentially a lot of work
with no net effect.

When we have accumulated max_pages, just clean up and return.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

659c6009

ext4: use dedicated slab caches for group_info structures · fb1813f4

由 Curt Wohlgemuth 提交于 10月 27, 2010

ext4_group_info structures are currently allocated with kmalloc().
With a typical 4K block size, these are 136 bytes each -- meaning
they'll each consume a 256-byte slab object.  On a system with many
ext4 large partitions, that's a lot of wasted kernel slab space.
(E.g., a single 1TB partition will have about 8000 block groups, using
about 2MB of slab, of which nearly 1MB is wasted.)

This patch creates an array of slab pointers created as needed --
depending on the superblock block size -- and uses these slabs to
allocate the group info objects.

Google-Bug-Id: 2980809
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fb1813f4

ext4: fix EOFBLOCKS_FL handling · 58590b06

由 Theodore Ts'o 提交于 10月 27, 2010

It turns out we have several problems with how EOFBLOCKS_FL is
handled.  First of all, there was a fencepost error where we were not
clearing the EOFBLOCKS_FL when fill in the last uninitialized block,
but rather when we allocate the next block _after_ the uninitalized
block.  Secondly we were not testing to see if we needed to clear the
EOFBLOCKS_FL when writing to the file O_DIRECT or when were converting
an uninitialized block (which is the most common case).

Google-Bug-Id: 2928259
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

58590b06

10 8月, 2010 5 次提交

mbcache: Remove unused features · 2aec7c52

由 Andreas Gruenbacher 提交于 7月 19, 2010

The mbcache code was written to support a variable number of indexes,
but all the existing users use exactly one index.  Simplify to code to
support only that case.

There are also no users of the cache entry free operation, and none of
the users keep extra data in cache entries.  Remove those features as
well.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2aec7c52

A
convert ext4 to ->evict_inode() · 0930fcc1
由 Al Viro 提交于 6月 07, 2010
```
pretty much brute-force...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
0930fcc1

remove inode_setattr · 1025774c

由 Christoph Hellwig 提交于 6月 04, 2010

Replace inode_setattr with opencoded variants of it in all callers.  This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.

In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:

 spufs: explicitly checks for ATTR_SIZE earlier
 btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
 ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1025774c

introduce __block_write_begin · 6e1db88d

由 Christoph Hellwig 提交于 6月 04, 2010

Split up the block_write_begin implementation - __block_write_begin is a new
trivial wrapper for block_prepare_write that always takes an already
allocated page and can be either called from block_write_begin or filesystem
code that already has a page allocated.  Remove the handling of already
allocated pages from block_write_begin after switching all callers that
do it to __block_write_begin.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6e1db88d

sort out blockdev_direct_IO variants · eafdc7d1

由 Christoph Hellwig 提交于 6月 04, 2010

Move the call to vmtruncate to get rid of accessive blocks to the callers
in prepearation of the new truncate calling sequence. This was only done
for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
its _newtrunc variant while at it as just opencoding the two additional
paramters is shorted than the name suffix.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eafdc7d1

06 8月, 2010 2 次提交

ext4: Adding error check after calling ext4_mb_regular_allocator() · 6c7a120a

由 Aditya Kali 提交于 8月 05, 2010

If the bitmap block on disk is bad, ext4_mb_load_buddy() returns an
error. This error is returned to the caller,
ext4_mb_regular_allocator() and then to ext4_mb_new_blocks().  But
ext4_mb_new_blocks() did not check for the return value of
ext4_mb_regular_allocator() and would repeatedly try to load the
bitmap block. The fix simply catches the return value and exits out of
the 'repeat' loop after cleanup.

We also take the opportunity to clean up the error handling in
ext4_mb_new_blocks().

Google-Bug-Id: 2853530
Signed-off-by: NAditya Kali <adityakali@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6c7a120a

ext4: Fix dirtying of journalled buffers in data=journal mode · 56d35a4c

由 Jan Kara 提交于 8月 05, 2010

In data=journal mode, we still use block_write_begin() to prepare
page for writing. This function can occasionally mark buffer dirty
which violates journalling assumptions - when a buffer is part of
a transaction, it should be dirty and a buffer can be already part
of a forget list of some transaction when block_write_begin()
gets called. This violation of journalling assumptions then results
in "JBD: Spotted dirty metadata buffer..." warnings.

In fact, temporary dirtying the buffer while the page is still locked
does not really cause problems to the journalling because we won't write
the buffer until the page gets unlocked. So we just have to make sure
to clear dirty bits before unlocking the page.
Signed-off-by: NJan Kara <jack@suse.cz>

56d35a4c

05 8月, 2010 1 次提交

ext4: re-inline ext4_rec_len_(to|from)_disk functions · 0cfc9255

由 Eric Sandeen 提交于 8月 05, 2010

commit 3d0518f4, "ext4: New rec_len encoding for very
large blocksizes" made several changes to this path, but from
a perf perspective, un-inlining ext4_rec_len_from_disk() seems
most significant.  This function is called from ext4_check_dir_entry(),
which on a file-creation workload is called extremely often.

I tested this with bonnie:

# bonnie++ -u root -s 0 -f -x 200 -d /mnt/test -n 32

(this does 200 iterations) and got this for the file creations:

ext4 stock:   Average =  21206.8 files/s
ext4 inlined: Average =  22346.7 files/s  (+5%)
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0cfc9255

04 8月, 2010 2 次提交

fix comment typo "choosed" -> "chosen" · 73b2c716

由 Uwe Kleine-König 提交于 7月 30, 2010

Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

73b2c716

jbd2: Change j_state_lock to be a rwlock_t · a931da6a

由 Theodore Ts'o 提交于 8月 03, 2010

Lockstat reports have shown that j_state_lock is a major source of
lock contention, especially on systems with more than 4 CPU cores.  So
change it to be a read/write spinlock.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a931da6a

02 8月, 2010 3 次提交

ext4: Add mount options in superblock · 8b67f04a

由 Theodore Ts'o 提交于 8月 01, 2010

Allow mount options to be stored in the superblock. Also add default
mount option bits for nobarrier, block_validity, discard, and nodelalloc.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8b67f04a

ext4: force block allocation on quota_off · ca0e05e4

由 Dmitry Monakhov 提交于 8月 01, 2010

Perform full sync procedure so that any delayed allocation blocks are
allocated so quota will be consistent.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ca0e05e4

ext4: fix freeze deadlock under IO · 437f88cc

由 Eric Sandeen 提交于 8月 01, 2010

Commit 6b0310fb caused a regression resulting in deadlocks
when freezing a filesystem which had active IO; the vfs_check_frozen
level (SB_FREEZE_WRITE) did not let the freeze-related IO syncing
through.  Duh.

Changing the test to FREEZE_TRANS should let the normal freeze
syncing get through the fs, but still block any transactions from
starting once the fs is completely frozen.

I tested this by running fsstress in the background while periodically
snapshotting the fs and running fsck on the result.  I ran into
occasional deadlocks, but different ones.  I think this is a
fine fix for the problem at hand, and the other deadlocky things
will need more investigation.
Reported-by: NPhillip Susi <psusi@cfl.rr.com>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

437f88cc

30 7月, 2010 1 次提交

ext4: drop inode from orphan list if ext4_delete_inode() fails · 45388219

由 Theodore Ts'o 提交于 7月 29, 2010

There were some error paths in ext4_delete_inode() which was not
dropping the inode from the orphan list.  This could lead to a BUG_ON
on umount when the orphan list is discovered to be non-empty.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

45388219

27 7月, 2010 18 次提交

ext4: check to make make sure bd_dev is set before dereferencing it · f613dfcb

由 Theodore Ts'o 提交于 7月 27, 2010

There are some drivers which may not set bdev->bd_dev.  So make sure
it is non-NULL before dereferencing it.

Google-Bug-Id: 1773557
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f613dfcb

ext4: don't print scary messages for allocation failures post-abort · e3570639

由 Eric Sandeen 提交于 7月 27, 2010

I often get emails containing the "This should not happen!!" message,
conveniently trimmed to remove things like:

sd 0:0:0:0: [sda] Unhandled error code
sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 03 13 c9 70 00 00 28 00
end_request: I/O error, dev sda, sector 51628400
Aborting journal on device dm-0-8.
EXT4-fs error (device dm-0): ext4_journal_start_sb: Detected aborted journal
EXT4-fs (dm-0): Remounting filesystem read-only

I don't think there is any value to the verbosity if the reason is
due to a filesystem abort; it just obfuscates the root cause.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e3570639

ext4: fix EFBIG edge case when writing to large non-extent file · d889dc83

由 Toshiyuki Okajima 提交于 7月 27, 2010

By running the following reproducer, we can confirm that the write 
system call returns with 0 when it should return the error EFBIG.

#!/bin/sh

/bin/dd if=/dev/zero of=./img bs=1k count=1 seek=1024k > /dev/null 2>&1
/sbin/mkfs.ext3 -Fq ./img
/bin/mount -o loop -t ext4 ./img /mnt
/bin/touch /mnt/file
strace /bin/dd if=/dev/zero of=/mnt/file conv=notrunc bs=1k count=1 seek=$((2194719883264/1024)) 2>&1 | /bin/egrep "write.* 1024\) = "
/bin/umount /mnt
exit
Signed-off-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@redhat.com>

d889dc83

ext4: fix ext4_get_blocks references · 79e83036

由 Eric Sandeen 提交于 7月 27, 2010

ext4_get_blocks got renamed to ext4_map_blocks, but left stale
comments and a prototype littered around.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

79e83036

ext4: Always journal quota file modifications · 62d2b5f2

由 Jan Kara 提交于 7月 27, 2010

When journaled quota options are not specified, we do writes
to quota files just in data=ordered mode. This actually causes
warnings from JBD2 about dirty journaled buffer because ext4_getblk
unconditionally treats a block allocated by it as metadata. Since
quota actually is filesystem metadata, the easiest way to get rid
of the warning is to always treat quota writes as metadata...
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

62d2b5f2

ext4: Fix potential memory leak in ext4_fill_super · dcc7dae3

由 Cyrill Gorcunov 提交于 7月 27, 2010

Under heavy memory pressure we may hit out of memory
situation and as result kstrdup'ed options will not be
freed. Fix it.
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

dcc7dae3

ext4: Don't error out the fs if the user tries to make a file too big · 0c095c7f

由 Theodore Ts'o 提交于 7月 27, 2010

If the user attempts to make a non-extent-mapped file to be too large,
return EFBIG, but don't call ext4_std_err() which will end up marking
the file system as containing an error.

Thanks to Toshiyuki Okajima-san at Fujitsu for pointing this out.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0c095c7f

ext4: allocate stripe-multiple IOs on stripe boundaries · 506bf2d8

由 Eric Sandeen 提交于 7月 27, 2010

For some reason, today mballoc only allocates IOs which are exactly
stripe-sized on a stripe boundary.  If you have a multiple (say, a
128k IO on a 64k stripe) you may end up unaligned.

It seems to me that a simple change to align stripe-multiple IOs
on stripe boundaries would be a very good idea, unless this breaks
some other mballoc heuristic for some reason...
Reported-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

506bf2d8

ext4: move aio completion after unwritten extent conversion · 5b3ff237

由 jiayingz@google.com (Jiaying Zhang) 提交于 7月 27, 2010

This patch is to be applied upon Christoph's "direct-io: move aio_complete
into ->end_io" patch. It adds iocb and result fields to struct ext4_io_end_t,
so that we can call aio_complete from ext4_end_io_nolock() after the extent
conversion has finished.

I have verified with Christoph's aio-dio test that used to fail after a few
runs on an original kernel but now succeeds on the patched kernel.

See http://thread.gmane.org/gmane.comp.file-systems.ext4/19659 for details.
Signed-off-by: NJiaying Zhang <jiayingz@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5b3ff237

direct-io: move aio_complete into ->end_io · 552ef802

由 Christoph Hellwig 提交于 7月 27, 2010

Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited.  That means
the aio_complete calls needs to be moved into the ->end_io callback so
that the filesystem can control when to call it exactly.

This makes a bit of a mess out of dio_complete and the ->end_io callback
prototype even more complicated. 
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz> 
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

552ef802

ext4: Support discard requests when running in no-journal mode · 5c521830

由 Jiaying Zhang 提交于 7月 27, 2010

Issue discard request in ext4_free_blocks() when ext4 has no journal and
is mounted with discard option.
Signed-off-by: NJiaying Zhang <jiayingz@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5c521830

ext4: Fix block bitmap inconsistencies after a crash when deleting files · 40389687

由 Amir G 提交于 7月 27, 2010

We have experienced bitmap inconsistencies after crash during file
delete under heavy load.  The crash is not file system related and I
the following patch in ext4_free_branches() fixes the recovery
problem.

If the transaction is restarted and there is a crash before the new
transaction is committed, then after recovery, the blocks that this
indirect block points to have been freed, but the indirect block
itself has not been freed and may still point to some of the free
blocks (because of the ext4_forget()).

So ext4_forget() should be called inside ext4_free_blocks() to avoid
this problem.
Signed-off-by: NAmir Goldstein <amir73il@users.sf.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

40389687

J
ext4: Remove unnecessary casts of private_data · a271fe85
由 Joe Perches 提交于 7月 27, 2010
```
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
a271fe85
T
ext4: fix potential NULL dereference while tracing · e5880d76
由 Theodore Ts'o 提交于 7月 27, 2010
```
The allocation_context pointer can be NULL.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
e5880d76

ext4: Define s_jnl_backup_type in superblock · 89eeddf0

由 Theodore Ts'o 提交于 7月 27, 2010

This has been in use by e2fsprogs for a while; define it to keep the
super block fields in sync.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

89eeddf0

ext4: Once a day, printk file system error information to dmesg · 66e61a9e

由 Theodore Ts'o 提交于 7月 27, 2010

This allows us to grab any file system error messages by scraping
/var/log/messages.  This will make it easy for us to do error analysis
across the very large number of machines as we deploy ext4 across the
fleet.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

66e61a9e

ext4: Save error information to the superblock for analysis · 1c13d5c0

由 Theodore Ts'o 提交于 7月 27, 2010

Save number of file system errors, and the time function name, line
number, block number, and inode number of the first and most recent
errors reported on the file system in the superblock.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1c13d5c0

T
ext4: Pass line numbers to ext4_error() and friends · c398eda0
由 Theodore Ts'o 提交于 7月 27, 2010
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
c398eda0

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年