提交 · d1f5273e9adb40724a85272f248f210dc4ce919a · OpenHarmony / kernel_linux

19 3月, 2012 1 次提交

ext4: return 32/64-bit dir name hash according to usage type · d1f5273e

由 Fan Yong 提交于 3月 18, 2012

Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
and telldir().  However, this causes problems if there are 32-bit hash
collisions, since the NFSv2 server can get stuck resending the same
entries from the directory repeatedly.

Allow ext4 to return a full 64-bit hash (both major and minor) for
telldir to decrease the chance of hash collisions.  This still needs
integration on the NFS side.
Patch-updated-by: NBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
(blame me if something is not correct)
Signed-off-by: NFan Yong <yong.fan@whamcloud.com>
Signed-off-by: NAndreas Dilger <adilger@whamcloud.com>
Signed-off-by: NBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d1f5273e

05 1月, 2012 3 次提交

ext4: make more symbols static · 5f163cc7

由 Eric Sandeen 提交于 1月 04, 2012

A couple more functions can reasonably be made static if desired.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5f163cc7

ext4: reserve new feature flag codepoints · 9b90e5e0

由 Theodore Ts'o 提交于 1月 04, 2012

Reserve the ext4 features flags EXT4_FEATURE_RO_COMPAT_METADATA_CSUM,
EXT4_FEATURE_INCOMPAT_INLINEDATA, and EXT4_FEATURE_INCOMPAT_LARGEDIR.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9b90e5e0

ext4: add new online resize interface · 19c5246d

由 Yongqiang Yang 提交于 1月 04, 2012

This patch adds new online resize interface, whose input argument is a
64-bit integer indicating how many blocks there are in the resized fs.

In new resize impelmentation, all work like allocating group tables
are done by kernel side, so the new resize interface can support
flex_bg feature and prepares ground for suppoting resize with features
like bigalloc and exclude bitmap. Besides these, user-space tools just
passes in the new number of blocks.

We delay initializing the bitmaps and inode tables of added groups if
possible and add multi groups (a flex groups) each time, so new resize
is very fast like mkfs.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

19c5246d

04 1月, 2012 2 次提交

ext4: add a function which sets up group blocks of a flex bg · 33afdcc5

由 Yongqiang Yang 提交于 1月 03, 2012

This patch adds a function named setup_new_flex_group_blocks() which
sets up group blocks of a flex bg.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

33afdcc5

A
ext4: propagate umode_t · dcca3fec
由 Al Viro 提交于 7月 26, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
dcca3fec

29 12月, 2011 2 次提交

ext4: use proper little-endian bitops · 597d508c

由 Akinobu Mita 提交于 12月 28, 2011

ext4_{set,clear}_bit() is defined as __test_and_{set,clear}_bit_le() for
ext4.  Only two ext4_{set,clear}_bit() calls check the return value.  The
rest of calls ignore the return value and they can be replaced with
__{set,clear}_bit_le().

This changes ext4_{set,clear}_bit() from __test_and_{set,clear}_bit_le()
to __{set,clear}_bit_le() and introduces ext4_test_and_{set,clear}_bit()
for the two places where old bit needs to be returned.

This ext4_{set,clear}_bit() change is considered safe, because if someone
uses these macros without noticing the change, new ext4_{set,clear}_bit
don't have return value and causes compiler errors where the return value
is used.

This also removes unused ext4_find_first_zero_bit().
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

597d508c

ext4: remove no longer used functions in inode.c · ccb4d7af

由 Zheng Liu 提交于 12月 28, 2011

The functions ext4_block_truncate_page() and ext4_block_zero_page_range()
are no longer used, so remove them.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ccb4d7af

01 11月, 2011 2 次提交

treewide: use __printf not __attribute__((format(printf,...))) · b9075fa9

由 Joe Perches 提交于 10月 31, 2011

Standardize the style for compiler based printf format verification.
Standardized the location of __printf too.

Done via script and a little typing.

$ grep -rPl --include=*.[ch] -w "__attribute__" * | \
  grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \
  xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }'

[akpm@linux-foundation.org: revert arch bits]
Signed-off-by: NJoe Perches <joe@perches.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9075fa9

ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwritten · 0edeb71d

由 Tao Ma 提交于 10月 31, 2011

EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten
should be done simultaneously since ext4_end_io_nolock always clear
the flag and decrease the counter in the same time.

We have found some bugs that the flag is set while leaving
i_aiodio_unwritten unchanged(commit 32c80b32). So this patch just tries
to create a helper function to wrap them to avoid any future bug.
The idea is inspired by Eric.

Cc: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0edeb71d

29 10月, 2011 1 次提交

ext4: fix quota accounting during migration · 5cb81dab

由 Dmitry Monakhov 提交于 10月 29, 2011

The tmp_inode should have same uid/gid as the original inode.
Otherwise new metadata blocks will be accounted to wrong quota-id,
which will result in a quota leak after the inode migration is
completed.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5cb81dab

25 10月, 2011 1 次提交

ext4: update EOFBLOCKS flag on fallocate properly · a4e5d88b

由 Dmitry Monakhov 提交于 10月 25, 2011

EOFBLOCK_FL should be updated if called w/o FALLOCATE_FL_KEEP_SIZE
Currently it happens only if new extent was allocated.

TESTCASE:
fallocate test_file -n -l4096
fallocate test_file -l4096
Last fallocate cmd has updated size, but keept EOFBLOCK_FL set. And
fsck will complain about that.

Also remove ping pong in ext4_fallocate() in case of new extents,
where ext4_ext_map_blocks() clear EOFBLOCKS bit, and later
ext4_falloc_update_inode() restore it again.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a4e5d88b

09 10月, 2011 2 次提交

ext4: remove the obsolete/broken EXT4_IOC_WAIT_FOR_READONLY ioctl · 7fd59c83

由 Tao Ma 提交于 10月 08, 2011

There are no users of the EXT4_IOC_WAIT_FOR_READONLY ioctl, and it is
also broken.  No one sets the set_ro_timer, no one wakes up us and our
state is set to TASK_INTERRUPTIBLE not RUNNING.  So remove it.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7fd59c83

ext4: remove deprecated oldalloc · 4113c4ca

由 Lukas Czerner 提交于 10月 08, 2011

For a long time now orlov is the default block allocator in the
ext4. It performs better than the old one and no one seems to claim
otherwise so we can safely drop it and make oldalloc and orlov mount
option deprecated.

This is a part of the effort to reduce number of ext4 options hence the
test matrix.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4113c4ca

10 9月, 2011 15 次提交

ext4: attempt to fix race in bigalloc code path · 5356f261

由 Aditya Kali 提交于 9月 09, 2011

Currently, there exists a race between delayed allocated writes and
the writeback when bigalloc feature is in use. The race was because we
wanted to determine what blocks in a cluster are under delayed
allocation and we were using buffer_delayed(bh) check for it. But, the
writeback codepath clears this bit without any synchronization which
resulted in a race and an ext4 warning similar to:

EXT4-fs (ram1): ext4_da_update_reserve_space: ino 13, used 1 with only 0
		reserved data blocks

The race existed in two places.
(1) between ext4_find_delalloc_range() and ext4_map_blocks() when called from
    writeback code path.
(2) between ext4_find_delalloc_range() and ext4_da_get_block_prep() (where
    buffer_delayed(bh) is set.

To fix (1), this patch introduces a new buffer_head state bit -
BH_Da_Mapped.  This bit is set under the protection of
EXT4_I(inode)->i_data_sem when we have actually mapped the delayed
allocated blocks during the writeout time. We can now reliably check
for this bit inside ext4_find_delalloc_range() to determine whether
the reservation for the blocks have already been claimed or not.

To fix (2), it was necessary to set buffer_delay(bh) under the
protection of i_data_sem.  So, I extracted the very beginning of
ext4_map_blocks into a new function - ext4_da_map_blocks() - and
performed the required setting of bh_delay bit and the quota
reservation under the protection of i_data_sem.  These two fixes makes
the checking of buffer_delay(bh) and buffer_da_mapped(bh) consistent,
thus removing the race.

Tested: I was able to reproduce the problem by running 'dd' and
'fsync' in parallel. Also, xfstests sometimes used to reproduce this
race. After the fix both my test and xfstests were successful and no
race (warning message) was observed.

Google-Bug-Id: 4997027
Signed-off-by: NAditya Kali <adityakali@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5356f261

ext4: add some tracepoints in ext4/extents.c · d8990240

由 Aditya Kali 提交于 9月 09, 2011

This patch adds some tracepoints in ext4/extents.c and updates a tracepoint in
ext4/inode.c.

Tested: Built and ran the kernel and verified that these tracepoints work.
Also ran xfstests.
Signed-off-by: NAditya Kali <adityakali@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d8990240

ext4: rename ext4_has_free_blocks() to ext4_has_free_clusters() · df55c99d

由 Theodore Ts'o 提交于 9月 09, 2011

Rename the function so it is more clear what is going on.  Also rename
the various variables so it's clearer what's happening.

Also fix a missing blocks to cluster conversion when reading the
number of reserved blocks for root.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

df55c99d

ext4: rename ext4_claim_free_blocks() to ext4_claim_free_clusters() · e7d5f315

由 Theodore Ts'o 提交于 9月 09, 2011

This function really claims a number of free clusters, not blocks, so
rename it so it's clearer what's going on.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e7d5f315

ext4: rename ext4_free_blocks_after_init() to ext4_free_clusters_after_init() · cff1dfd7

由 Theodore Ts'o 提交于 9月 09, 2011

This function really returns the number of clusters after initializing
an uninitalized block bitmap has been initialized.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

cff1dfd7

ext4: rename ext4_count_free_blocks() to ext4_count_free_clusters() · 5dee5437

由 Theodore Ts'o 提交于 9月 09, 2011

This function really counts the free clusters reported in the block
group descriptors, so rename it to reduce confusion.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5dee5437

ext4: Rename ext4_free_blks_{count,set}() to refer to clusters · 021b65bb

由 Theodore Ts'o 提交于 9月 09, 2011

The field bg_free_blocks_count_{lo,high} in the block group
descriptor has been repurposed to hold the number of free clusters for
bigalloc functions.  So rename the functions so it makes it easier to
read and audit the block allocation and block freeing code.

Note: at this point in bigalloc development we doesn't support
online resize, so this also makes it really obvious all of the places
we need to fix up to add support for online resize.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

021b65bb

ext4: enable mounting bigalloc as read/write · 6f16b606

由 Theodore Ts'o 提交于 9月 09, 2011

Now that we have implemented all of the changes needed for bigalloc,
we can finally enable it!
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6f16b606

ext4: Fix bigalloc quota accounting and i_blocks value · 7b415bf6

由 Aditya Kali 提交于 9月 09, 2011

With bigalloc changes, the i_blocks value was not correctly set (it was still
set to number of blocks being used, but in case of bigalloc, we want i_blocks
to represent the number of clusters being used). Since the quota subsystem sets
the i_blocks value, this patch fixes the quota accounting and makes sure that
the i_blocks value is set correctly.
Signed-off-by: NAditya Kali <adityakali@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7b415bf6

ext4: convert the free_blocks field in s_flex_groups to be free_clusters · 24aaa8ef

由 Theodore Ts'o 提交于 9月 09, 2011

Convert the free_blocks to be free_clusters to make the final revised
bigalloc changes easier to read/understand.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

24aaa8ef

ext4: convert s_{dirty,free}blocks_counter to s_{dirty,free}clusters_counter · 57042651

由 Theodore Ts'o 提交于 9月 09, 2011

Convert the percpu counters s_dirtyblocks_counter and
s_freeblocks_counter in struct ext4_super_info to be
s_dirtyclusters_counter and s_freeclusters_counter.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

57042651

ext4: teach ext4_free_blocks() about bigalloc and clusters · 84130193

由 Theodore Ts'o 提交于 9月 09, 2011

The ext4_free_blocks() function now has two new flags that indicate
whether a partial cluster at the beginning or the end of the block
extents should be freed or not.  That will be up the caller (i.e.,
truncate), who can figure out whether partial clusters at the
beginning or the end of a block range can be freed.

We also have to update the ext4_mb_free_metadata() and
release_blocks_on_commit() machinery to be cluster-based, since it is
used by ext4_free_blocks().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

84130193

ext4: bigalloc changes to block bitmap initialization functions · d5b8f310

由 Theodore Ts'o 提交于 9月 09, 2011

Add bigalloc support to ext4_init_block_bitmap() and
ext4_free_blocks_after_init().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d5b8f310

ext4: split out ext4_free_blocks_after_init() · fd034a84

由 Theodore Ts'o 提交于 9月 09, 2011

The function ext4_free_blocks_after_init() used to be a #define of
ext4_init_block_bitmap().  This actually made it difficult to
understand how the function worked, and made it hard make changes to
support clusters.  So as an initial cleanup, I've separated out the
functionality of initializing block bitmap from calculating the number
of free blocks in the new block group.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fd034a84

ext4: read-only support for bigalloc file systems · 281b5995

由 Theodore Ts'o 提交于 9月 09, 2011

This adds supports for bigalloc file systems.  It teaches the mount
code just enough about bigalloc superblock fields that it will mount
the file system without freaking out that the number of blocks per
group is too big.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

281b5995

04 9月, 2011 1 次提交

ext4: improve handling of conflicting mount options · 56889787

由 Theodore Ts'o 提交于 9月 03, 2011

If the user explicitly specifies conflicting mount options for
delalloc or dioread_nolock and data=journal, fail the mount, instead
of printing a warning and continuing (since many user's won't look at
dmesg and notice the warning).

Also, print a single warning that data=journal implies that delayed
allocation is not on by default (since it's not supported), and
furthermore that O_DIRECT is not supported.  Improve the text in
Documentation/filesystems/ext4.txt so this is clear there as well.

Similarly, if the dioread_nolock mount option is specified when the
file system block size != PAGE_SIZE, fail the mount instead of
printing a warning message and ignoring the mount option.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

56889787

03 9月, 2011 1 次提交

ext4: Add new ext4_discard_partial_page_buffers routines · 4e96b2db

由 Allison Henderson 提交于 9月 03, 2011

This patch adds two new routines: ext4_discard_partial_page_buffers
and ext4_discard_partial_page_buffers_no_lock.

The ext4_discard_partial_page_buffers routine is a wrapper
function to ext4_discard_partial_page_buffers_no_lock.
The wrapper function locks the page and passes it to
ext4_discard_partial_page_buffers_no_lock.
Calling functions that already have the page locked can call
ext4_discard_partial_page_buffers_no_lock directly.

The ext4_discard_partial_page_buffers_no_lock function
zeros a specified range in a page, and unmaps the
corresponding buffer heads.  Only block aligned regions of the
page will have their buffer heads unmapped.  Unblock aligned regions
will be mapped if needed so that they can be updated with the
partial zero out.  This function is meant to
be used to update a page and its buffer heads to be zeroed
and unmapped when the corresponding blocks have been released
or will be released.

This routine is used in the following scenarios:
* A hole is punched and the non page aligned regions
  of the head and tail of the hole need to be discarded

* The file is truncated and the partial page beyond EOF needs
  to be discarded

* The end of a hole is in the same page as EOF.  After the
  page is flushed, the partial page beyond EOF needs to be
  discarded.

* A write operation begins or ends inside a hole and the partial
  page appearing before or after the write needs to be discarded

* A write operation extends EOF and the partial page beyond EOF
  needs to be discarded

This function takes a flag EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED
which is used when a write operation begins or ends in a hole.
When the EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED flag is used, only
buffer heads that are already unmapped will have the corresponding
regions of the page zeroed.
Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4e96b2db

31 8月, 2011 2 次提交

ext2,ext3,ext4: don't inherit APPEND_FL or IMMUTABLE_FL for new inodes · 1cd9f097

由 Theodore Ts'o 提交于 8月 31, 2011

This doesn't make much sense, and it exposes a bug in the kernel where
attempts to create a new file in an append-only directory using
O_CREAT will fail (but still leave a zero-length file).  This was
discovered when xfstests #79 was generalized so it could run on all
file systems.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc:stable@kernel.org

1cd9f097

ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining · 8c0bec21

由 Jiaying Zhang 提交于 8月 31, 2011

The i_mutex lock and flush_completed_IO() added by commit 2581fdc8
in ext4_evict_inode() causes lockdep complaining about potential
deadlock in several places. In most/all of these LOCKDEP complaints
it looks like it's a false positive, since many of the potential
circular locking cases can't take place by the time the
ext4_evict_inode() is called; but since at the very least it may mask
real problems, we need to address this.

This change removes the flush_completed_IO() and i_mutex lock in
ext4_evict_inode(). Instead, we take a different approach to resolve
the software lockup that commit 2581fdc8 intends to fix. Rather
than having ext4-dio-unwritten thread wait for grabing the i_mutex
lock of an inode, we use mutex_trylock() instead, and simply requeue
the work item if we fail to grab the inode's i_mutex lock.

This should speed up work queue processing in general and also
prevents the following deadlock scenario: During page fault,
shrink_icache_memory is called that in turn evicts another inode B.
Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero. However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.
Signed-off-by: NJiaying Zhang <jiayingz@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8c0bec21

01 8月, 2011 1 次提交

ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree() · 9933fc0a

由 Theodore Ts'o 提交于 8月 01, 2011

Introduce new helper functions which try kmalloc, and then fall back
to vmalloc if necessary, and use them for allocating and deallocating
s_flex_groups.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9933fc0a

27 7月, 2011 4 次提交

ext4: let setup_new_group_blocks() set multiple bits at a time · c3e94d1d

由 Yongqiang Yang 提交于 7月 26, 2011

Rename mb_set_bits() to ext4_set_bits() and make it a global function
so that setup_new_group_blocks() can use it.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c3e94d1d

ext4: let ext4_group_add_blocks() return an error code · cc7365df

由 Yongqiang Yang 提交于 7月 26, 2011

This patch lets ext4_group_add_blocks() return an error code if it
fails, so that upper functions can handle error correctly.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

cc7365df

Y
ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks() · 0529155e
由 Yongqiang Yang 提交于 7月 26, 2011
```
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
0529155e

ext4: prevent parallel resizers by atomic bit ops · 8f82f840

由 Yongqiang Yang 提交于 7月 26, 2011

Before this patch, parallel resizers are allowed and protected by a
mutex lock, actually, there is no need to support parallel resizer, so
this patch prevents parallel resizers by atmoic bit ops, like
lock_page() and unlock_page() do.

To do this, the patch removed the mutex lock s_resize_lock from struct
ext4_sb_info and added a unsigned long field named s_resize_flags
which inidicates if there is a resizer.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8f82f840

21 7月, 2011 1 次提交

fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82

由 Josef Bacik 提交于 7月 16, 2011

Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2. For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

02c24a82

11 7月, 2011 1 次提交

ext4: Speed up FITRIM by recording flags in ext4_group_info · 3d56b8d2

由 Tao Ma 提交于 7月 11, 2011

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state.

A simple test with my intel x25m ssd:
df -h shows:
/dev/sdb1              40G   21G   17G  56% /mnt/ext4
Block size:               4096

run the FITRIM with the following parameter:
range.start = 0;
range.len = UINT64_MAX;
range.minlen = 1048576;

without the patch:
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.505s
user	0m0.000s
sys	0m1.224s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.359s
user	0m0.000s
sys	0m1.178s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.228s
user	0m0.000s
sys	0m1.151s

with the patch:
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.625s
user	0m0.000s
sys	0m1.269s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s

A big improvement for the 2nd and 3rd run.

Even after I delete some big image files, it is still much
faster than iterating the whole disk.

[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m1.217s
user	0m0.000s
sys	0m0.196s

Cc: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: NAndreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3d56b8d2

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多