提交 · 07a038245b28df9196ffb2e8cc626e9b956a4e23 · openanolis / cloud-kernel

14 6月, 2010 1 次提交

ext4: Convert more i_flags references to use accessor functions · 07a03824

由 Theodore Ts'o 提交于 6月 14, 2010

These changes are not ones which are likely to result in races, but
they should be fixed.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

07a03824

12 6月, 2010 1 次提交

ext4: Clean up s_dirt handling · a0375156

由 Theodore Ts'o 提交于 6月 11, 2010

We don't need to set s_dirt in most of the ext4 code when journaling
is enabled.  In ext3/4 some of the summary statistics for # of free
inodes, blocks, and directories are calculated from the per-block
group statistics when the file system is mounted or unmounted.  As a
result the superblock doesn't have to be updated, either via the
journal or by setting s_dirt.  There are a few exceptions, most
notably when resizing the file system, where the superblock needs to
be modified --- and in that case it should be done as a journalled
operation if possible, and s_dirt set only in no-journal mode.

This patch will optimize out some unneeded disk writes when using ext4
with a journal.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a0375156

05 6月, 2010 1 次提交

ext4: Fix remaining racy updates of EXT4_I(inode)->i_flags · 84a8dce2

由 Dmitry Monakhov 提交于 6月 05, 2010

A few functions were still modifying i_flags in a racy manner.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

84a8dce2

03 6月, 2010 1 次提交

ext4: Make sure the MOVE_EXT ioctl can't overwrite append-only files · 1f5a81e4

由 Theodore Ts'o 提交于 6月 02, 2010

Dan Roseberg has reported a problem with the MOVE_EXT ioctl.  If the
donor file is an append-only file, we should not allow the operation
to proceed, lest we end up overwriting the contents of an append-only
file.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Dan Rosenberg <dan.j.rosenberg@gmail.com>

1f5a81e4

28 5月, 2010 2 次提交

rename the generic fsync implementations · 1b061d92

由 Christoph Hellwig 提交于 5月 26, 2010

We don't name our generic fsync implementations very well currently.
The no-op implementation for in-memory filesystems currently is called
simple_sync_file which doesn't make too much sense to start with,
the the generic one for simple filesystems is called simple_fsync
which can lead to some confusion.

This patch renames the generic file fsync method to generic_file_fsync
to match the other generic_file_* routines it is supposed to be used
with, and the no-op implementation to noop_fsync to make it obvious
what to expect.  In addition add some documentation for both methods.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1b061d92

drop unused dentry argument to ->fsync · 7ea80859

由 Christoph Hellwig 提交于 5月 26, 2010

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7ea80859

24 5月, 2010 5 次提交

quota: rename default quotactl methods to dquot_ · 287a8095

由 Christoph Hellwig 提交于 5月 19, 2010

Follow the dquot_* style used elsewhere in dquot.c.

[Jan Kara: Fixed up missing conversion of ext2]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

287a8095

quota: drop remount argument to ->quota_on and ->quota_off · 307ae18a

由 Christoph Hellwig 提交于 5月 19, 2010

Remount handling has fully moved into the filesystem, so all this is
superflous now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

307ae18a

quota: move unmount handling into the filesystem · e0ccfd95

由 Christoph Hellwig 提交于 5月 19, 2010

Currently the VFS calls into the quotactl interface for unmounting
filesystems.  This means filesystems with their own quota handling
can't easily distinguish between user-space originating quotaoff
and an unount.  Instead move the responsibily of the unmount handling
into the filesystem to be consistent with all other dquot handling.

Note that we do call dquot_disable a lot later now, e.g. after
a sync_filesystem.  But this is fine as the quota code does all its
writes via blockdev's mapping and that is synced even later.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

e0ccfd95

quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappers · 0f0dd62f

由 Christoph Hellwig 提交于 5月 19, 2010

Instead of having wrappers in the VFS namespace export the dquot_suspend
and dquot_resume helpers directly.  Also rename vfs_quota_disable to
dquot_disable while we're at it.

[Jan Kara: Moved dquot_suspend to quotaops.h and made it inline]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

0f0dd62f

quota: move remount handling into the filesystem · c79d967d

由 Christoph Hellwig 提交于 5月 19, 2010

Currently do_remount_sb calls into the dquot code to tell it about going
from rw to ro and ro to rw.  Move this code into the filesystem to
not depend on the dquot code in the VFS - note ocfs2 already ignores
these calls and handles remount by itself.  This gets rid of overloading
the quotactl calls and allows to unify the VFS and XFS codepaths in
that area later.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

c79d967d

22 5月, 2010 3 次提交

ext4: replace inode uid,gid,mode init with helper · b10b8520

由 Dmitry Monakhov 提交于 3月 04, 2010

Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b10b8520

ext4: constify xattr_handler · 11e27528

由 Stephen Hemminger 提交于 5月 13, 2010

Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

11e27528

quota: unify quota init condition in setattr · 12755627

由 Dmitry Monakhov 提交于 4月 08, 2010

Quota must being initialized if size or uid/git changes requested.
But initialization performed in two different places:
in case of i_size file system is responsible for dquot init
, but in case of uid/gid init will be called internally in
dquot_transfer().
This ambiguity makes code harder to understand.
Let's move this logic to one common helper function.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJan Kara <jack@suse.cz>

12755627

17 5月, 2010 20 次提交

ext4: Make fsync sync new parent directories in no-journal mode · 14ece102

由 Frank Mayhar 提交于 5月 17, 2010

Add a new ext4 state to tell us when a file has been newly created; use
that state in ext4_sync_file in no-journal mode to tell us when we need
to sync the parent directory as well as the inode and data itself.  This
fixes a problem in which a panic or power failure may lose the entire
file even when using fsync, since the parent directory entry is lost.

Addresses-Google-Bug: #2480057
Signed-off-by: NFrank Mayhar <fmayhar@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

14ece102

ext4: Drop whitespace at end of lines · 60e6679e

由 Theodore Ts'o 提交于 5月 17, 2010

This patch was generated using:

#!/usr/bin/perl -i
while (<>) {
    s/[ 	]+$//;
    print;
}
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

60e6679e

ext4: Fix compat EXT4_IOC_ADD_GROUP · 4d92dc0f

由 Ben Hutchings 提交于 5月 17, 2010

struct ext4_new_group_input needs to be converted because u64 has
only 32-bit alignment on some 32-bit architectures, notably i386.
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4d92dc0f

ext4: Conditionally define compat ioctl numbers · 899ad0ce

由 Ben Hutchings 提交于 5月 17, 2010

It is unnecessary, and in general impossible, to define the compat
ioctl numbers except when building the filesystem with CONFIG_COMPAT
defined.
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

899ad0ce

T
ext4: Add new tracepoints to track mballoc's buddy bitmap loads · f307333e
由 Theodore Ts'o 提交于 5月 17, 2010
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
f307333e

ext4: Add a missing trace hook · 5a58ec87

由 Li Zefan 提交于 5月 17, 2010

Commit f8ec9d68 added a
trace event ext4_da_release_space, but didn't add some
corresponding trace hook.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5a58ec87

ext4: restart ext4_ext_remove_space() after transaction restart · 0617b83f

由 Dmitry Monakhov 提交于 5月 17, 2010

If i_data_sem was internally dropped due to transaction restart, it is
necessary to restart path look-up because extents tree was possibly
modified by ext4_get_block().

https://bugzilla.kernel.org/show_bug.cgi?id=15827Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Acked-by: NJan Kara <jack@suse.cz>

0617b83f

ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted · 786ec791

由 Theodore Ts'o 提交于 5月 17, 2010

Dimitry Monakhov discovered an edge case where it was possible for the
EXT4_EOFBLOCKS_FL flag could get cleared unnecessarily.  This is true;
I have a test case that can be exercised via downloading and
decompressing the file:

wget ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/ext4-testcases/eofblocks-fl-test-case.img.bz2
bunzip2 eofblocks-fl-test-case.img
dd if=/dev/zero of=eofblocks-fl-test-case.img bs=1k seek=17925 bs=1k count=1 conv=notrunc

However, triggering it in real life is highly unlikely since it
requires an extremely fragmented sparse file with a hole in exactly
the right place in the extent tree.  (It actually took quite a bit of
work to generate this test case.)  Still, it's nice to get even
extreme corner cases to be correct, so this patch makes sure that we
don't clear the EXT4_EOFBLOCKS_FL incorrectly even in this corner
case.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

786ec791

ext4: Avoid crashing on NULL ptr dereference on a filesystem error · f70f362b

由 Theodore Ts'o 提交于 5月 16, 2010

If the EOFBLOCK_FL flag is set when it should not be and the inode is
zero length, then eh_entries is zero, and ex is NULL, so dereferencing
ex to print ex->ee_block causes a kernel OOPS in
ext4_ext_map_blocks().

On top of that, the error message which is printed isn't very helpful.
So we fix this by printing something more explanatory which doesn't
involve trying to print ex->ee_block.

Addresses-Google-Bug: #2655740
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f70f362b

ext4: Use bitops to read/modify i_flags in struct ext4_inode_info · 12e9b892

由 Dmitry Monakhov 提交于 5月 16, 2010

At several places we modify EXT4_I(inode)->i_flags without holding
i_mutex (ext4_do_update_inode, ...). These modifications are racy and
we can lose updates to i_flags. So convert handling of i_flags to use
bitops which are atomic.

https://bugzilla.kernel.org/show_bug.cgi?id=15792Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

12e9b892

ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE() · 24676da4

由 Theodore Ts'o 提交于 5月 16, 2010

EXT4_ERROR_INODE() tends to provide better error information and in a
more consistent format.  Some errors were not even identifying the inode
or directory which was corrupted, which made them not very useful.

Addresses-Google-Bug: #2507977
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

24676da4

ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks() · 2ed88685

由 Theodore Ts'o 提交于 5月 16, 2010

This saves a huge amount of stack space by avoiding unnecesary struct
buffer_head's from being allocated on the stack.

In addition, to make the code easier to understand, collapse and
refactor ext4_get_block(), ext4_get_block_write(),
noalloc_get_block_write(), into a single function.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2ed88685

ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks() · e35fd660

由 Theodore Ts'o 提交于 5月 16, 2010

Jack up ext4_get_blocks() and add a new function, ext4_map_blocks()
which uses a much smaller structure, struct ext4_map_blocks which is
20 bytes, as opposed to a struct buffer_head, which nearly 5 times
bigger on an x86_64 machine. By switching things to use
ext4_map_blocks(), we can save stack space by using ext4_map_blocks()
since we can avoid allocating a struct buffer_head on the stack.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e35fd660

ext4: Use our own write_cache_pages() · 8e48dcfb

由 Theodore Ts'o 提交于 5月 16, 2010

Make a copy of write_cache_pages() for the benefit of
ext4_da_writepages().  This allows us to simplify the code some, and
will allow us to further customize the code in future patches.

There are some nasty hacks in write_cache_pages(), which Linus has
(correctly) characterized as vile.  I've just copied it into
write_cache_pages_da(), without trying to clean those bits up lest I
break something in the ext4's delalloc implementation, which is a bit
fragile right now.  This will allow Dave Chinner to clean up
write_cache_pages() in mm/page-writeback.c, without worrying about
breaking ext4.  Eventually write_cache_pages_da() will go away when I
rewrite ext4's delayed allocation and create a general
ext4_writepages() which is used for all of ext4's writeback.  Until
now this is the lowest risk way to clean up the core
write_cache_pages() function.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Dave Chinner <david@fromorbit.com>

8e48dcfb

ext4: Show journal_checksum option · 39a4bade

由 Jan Kara 提交于 5月 16, 2010

We failed to show journal_checksum option in /proc/mounts. Fix it.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

39a4bade

ext4: Fix for ext4_mb_collect_stats() · 291dae47

由 Curt Wohlgemuth 提交于 5月 16, 2010

Fix ext4_mb_collect_stats() to use the correct test for s_bal_success; it
should be testing "best-extent.fe_len >= orig-extent.fe_len" , not
"orig-extent.fe_len >= goal-extent.fe_len" .
Signed-off-by: NCurt Wohlgemuth <curtw@google.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

291dae47

ext4: check for a good block group before loading buddy pages · 8a57d9d6

由 Curt Wohlgemuth 提交于 5月 16, 2010

This adds a new field in ext4_group_info to cache the largest available
block range in a block group; and don't load the buddy pages until *after*
we've done a sanity check on the block group.

With large allocation requests (e.g., fallocate(), 8MiB) and relatively full
partitions, it's easy to have no block groups with a block extent large
enough to satisfy the input request length.  This currently causes the loop
during cr == 0 in ext4_mb_regular_allocator() to load the buddy bitmap pages
for EVERY block group.  That can be a lot of pages.  The patch below allows
us to call ext4_mb_good_group() BEFORE we load the buddy pages (although we
have check again after we lock the block group).

Addresses-Google-Bug: #2578108
Addresses-Google-Bug: #2704453
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8a57d9d6

ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate · 6d19c42b

由 Nikanth Karthikesan 提交于 5月 16, 2010

Currently using posix_fallocate one can bypass an RLIMIT_FSIZE limit
and create a file larger than the limit. Add a check for that.
Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
Signed-off-by: NAmit Arora <aarora@in.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6d19c42b

ext4: Remove extraneous newlines in ext4_msg() calls · fbe845dd

由 Curt Wohlgemuth 提交于 5月 16, 2010

Addresses-Google-Bug: #2562325
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fbe845dd

ext4: Print mount options in when mounting and add a remount message · d4c402d9

由 Curt Wohlgemuth 提交于 5月 16, 2010

This adds a "re-mounted" message to ext4_remount(), and both it and
the mount message in ext4_fill_super() now have the original mount
options data string.
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d4c402d9

16 5月, 2010 6 次提交

ext4: don't use quota reservation for speculative metadata · 72b8ab9d

由 Eric Sandeen 提交于 5月 16, 2010

Because we can badly over-reserve metadata when we
calculate worst-case, it complicates things for quota, since
we must reserve and then claim later, retry on EDQUOT, etc.
Quota is also a generally smaller pool than fs free blocks,
so this over-reservation hurts more, and more often.

I'm of the opinion that it's not the worst thing to allow
metadata to push a user slightly over quota.  This simplifies
the code and avoids the false quota rejections that result
from worst-case speculation.

This patch stops the speculative quota-charging for
worst-case metadata requirements, and just charges quota
when the blocks are allocated at writeout.  It also is
able to remove the try-again loop on EDQUOT.

This patch has been tested indirectly by running the xfstests
suite with a hack to mount & enable quota prior to the test.

I also did a more specific test of fragmenting freespace
and then doing a large delalloc write under quota; quota
stopped me at the right amount of file IO, and then the
writeout generated enough metadata (due to the fragmentation)
that it put me slightly over quota, as expected.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

72b8ab9d

ext4: init statistics after journal recovery · 84061e07

由 Dmitry Monakhov 提交于 5月 16, 2010

Currently block/inode/dir counters initialized before journal was
recovered. In fact after journal recovery this info will probably
change. And freeblocks it critical for correct delalloc mode
accounting.

https://bugzilla.kernel.org/show_bug.cgi?id=15768Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

84061e07

ext4: clean up inode bitmaps manipulation in ext4_free_inode · d17413c0

由 Dmitry Monakhov 提交于 5月 16, 2010

- Reorganize locking scheme to batch two atomic operation in to one.
  This also allow us to state what healthy group must obey following rule
  ext4_free_inodes_count(sb, gdp) == ext4_count_free(inode_bitmap, NUM);
- Fix possible undefined pointer dereference.
- Even if group descriptor stats aren't accessible we have to update
  inode bitmaps.
- Move non-group members update out of group_lock.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d17413c0

ext4: Do not zero out uninitialized extents beyond i_size · 21ca087a

由 Dmitry Monakhov 提交于 5月 16, 2010

The extents code will sometimes zero out blocks and mark them as
initialized instead of splitting an extent into several smaller ones.
This optimization however, causes problems if the extent is beyond
i_size because fsck will complain if there are uninitialized blocks
after i_size as this can not be distinguished from an inode that has
an incorrect i_size field.

https://bugzilla.kernel.org/show_bug.cgi?id=15742Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

21ca087a

ext4: don't scan/accumulate more pages than mballoc will allocate · c445e3e0

由 Eric Sandeen 提交于 5月 16, 2010

There was a bug reported on RHEL5 that a 10G dd on a 12G box
had a very, very slow sync after that.

At issue was the loop in write_cache_pages scanning all the way
to the end of the 10G file, even though the subsequent call
to mpage_da_submit_io would only actually write a smallish amt; then
we went back to the write_cache_pages loop ... wasting tons of time
in calling __mpage_da_writepage for thousands of pages we would
just revisit (many times) later.

Upstream it's not such a big issue for sys_sync because we get
to the loop with a much smaller nr_to_write, which limits the loop.

However, talking with Aneesh he realized that fsync upstream still
gets here with a very large nr_to_write and we face the same problem.

This patch makes mpage_add_bh_to_extent stop the loop after we've
accumulated 2048 pages, by setting mpd->io_done = 1; which ultimately
causes the write_cache_pages loop to break.

Repeating the test with a dirty_ratio of 80 (to leave something for
fsync to do), I don't see huge IO performance gains, but the reduction
in cpu usage is striking: 80% usage with stock, and 2% with the
below patch.  Instrumenting the loop in write_cache_pages clearly
shows that we are wasting time here.

Eventually we need to change mpage_da_map_pages() also submit its I/O
to the block layer, subsuming mpage_da_submit_io(), and then change it
call ext4_get_blocks() multiple times.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c445e3e0

ext4: stop issuing discards if not supported by device · a30eec2a

由 Eric Sandeen 提交于 5月 16, 2010

Turn off issuance of discard requests if the device does
not support it - similar to the action we take for barriers.
This will save a little computation time if a non-discardable
device is mounted with -o discard, and also makes it obvious
that it's not doing what was asked at mount time ...
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a30eec2a

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功