提交 · d44651d0f922b7aaeddd9fc04f2f5700a65983dd · openeuler / Kernel

09 10月, 2011 6 次提交

ext4: fix ext4 so it works without CONFIG_PROC_FS · d44651d0

由 Fabrice Jouhaud 提交于 10月 08, 2011

This fixes a bug which was introduced in dd68314c.  The problem
came from the test of the return value of proc_mkdir which is always
false without procfs, and this would initialization of ext4.
Signed-off-by: NFabrice Jouhaud <yargil@free.fr>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d44651d0

ext4: use le32_to_cpu for ext4_extent_idx.ei_block in ext4_ext_search_left() · 6ee3b212

由 Tao Ma 提交于 10月 08, 2011

ext4_extent_idx.e_block is __le32, so use le32_to_cpu() in
ext4_ext_search_left().
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6ee3b212

ext4: remove the obsolete/broken EXT4_IOC_WAIT_FOR_READONLY ioctl · 7fd59c83

由 Tao Ma 提交于 10月 08, 2011

There are no users of the EXT4_IOC_WAIT_FOR_READONLY ioctl, and it is
also broken.  No one sets the set_ro_timer, no one wakes up us and our
state is set to TASK_INTERRUPTIBLE not RUNNING.  So remove it.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7fd59c83

ext4: fix the comment describing ext4_ext_search_right() · df3ab170

由 Tao Ma 提交于 10月 08, 2011

The comment describing what ext4_ext_search_right() does is incorrect.
We return 0 in *phys when *logical is the 'largest' allocated block,
not smallest.  

Fix a few other typos while we're at it.

Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>

df3ab170

ext4: remove deprecated oldalloc · 4113c4ca

由 Lukas Czerner 提交于 10月 08, 2011

For a long time now orlov is the default block allocator in the
ext4. It performs better than the old one and no one seems to claim
otherwise so we can safely drop it and make oldalloc and orlov mount
option deprecated.

This is a part of the effort to reduce number of ext4 options hence the
test matrix.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4113c4ca

ext4: documentation: remove acl and user_xattr mount options · af909a57

由 Theodore Ts'o 提交于 10月 08, 2011

Acl and user_xattr mount options are no longer needed since those
features are enabled by default if configured in (seee commit
ea663336). We can not easily deprecate
mount options itself (since it is probably too early), but we can
remove it from documentation first.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

af909a57

07 10月, 2011 1 次提交

ext4: Free resources in some error path in ext4_fill_super · dcf2d804

由 Tao Ma 提交于 10月 06, 2011

Some of the error path in ext4_fill_super don't release the
resouces properly. So this patch just try to release them
in the right way.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

dcf2d804

06 10月, 2011 1 次提交

ext4: Free resources in ext4_mb_init()'s error paths · 7aa0baea

由 Tao Ma 提交于 10月 06, 2011

In commit 79a77c5a, we move ext4_mb_init_backend after the allocation
of s_locality_group to avoid memory leak in error path, but there are
still some other error paths in ext4_mb_init that need to do the same
work. So this patch adds all the error patch for ext4_mb_init. And all
the pointers are reset to NULL in case the caller may double free them.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7aa0baea

10 9月, 2011 25 次提交

ext4: attempt to fix race in bigalloc code path · 5356f261

由 Aditya Kali 提交于 9月 09, 2011

Currently, there exists a race between delayed allocated writes and
the writeback when bigalloc feature is in use. The race was because we
wanted to determine what blocks in a cluster are under delayed
allocation and we were using buffer_delayed(bh) check for it. But, the
writeback codepath clears this bit without any synchronization which
resulted in a race and an ext4 warning similar to:

EXT4-fs (ram1): ext4_da_update_reserve_space: ino 13, used 1 with only 0
		reserved data blocks

The race existed in two places.
(1) between ext4_find_delalloc_range() and ext4_map_blocks() when called from
    writeback code path.
(2) between ext4_find_delalloc_range() and ext4_da_get_block_prep() (where
    buffer_delayed(bh) is set.

To fix (1), this patch introduces a new buffer_head state bit -
BH_Da_Mapped.  This bit is set under the protection of
EXT4_I(inode)->i_data_sem when we have actually mapped the delayed
allocated blocks during the writeout time. We can now reliably check
for this bit inside ext4_find_delalloc_range() to determine whether
the reservation for the blocks have already been claimed or not.

To fix (2), it was necessary to set buffer_delay(bh) under the
protection of i_data_sem.  So, I extracted the very beginning of
ext4_map_blocks into a new function - ext4_da_map_blocks() - and
performed the required setting of bh_delay bit and the quota
reservation under the protection of i_data_sem.  These two fixes makes
the checking of buffer_delay(bh) and buffer_da_mapped(bh) consistent,
thus removing the race.

Tested: I was able to reproduce the problem by running 'dd' and
'fsync' in parallel. Also, xfstests sometimes used to reproduce this
race. After the fix both my test and xfstests were successful and no
race (warning message) was observed.

Google-Bug-Id: 4997027
Signed-off-by: NAditya Kali <adityakali@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5356f261

ext4: add some tracepoints in ext4/extents.c · d8990240

由 Aditya Kali 提交于 9月 09, 2011

This patch adds some tracepoints in ext4/extents.c and updates a tracepoint in
ext4/inode.c.

Tested: Built and ran the kernel and verified that these tracepoints work.
Also ran xfstests.
Signed-off-by: NAditya Kali <adityakali@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d8990240

ext4: rename ext4_has_free_blocks() to ext4_has_free_clusters() · df55c99d

由 Theodore Ts'o 提交于 9月 09, 2011

Rename the function so it is more clear what is going on.  Also rename
the various variables so it's clearer what's happening.

Also fix a missing blocks to cluster conversion when reading the
number of reserved blocks for root.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

df55c99d

ext4: rename ext4_claim_free_blocks() to ext4_claim_free_clusters() · e7d5f315

由 Theodore Ts'o 提交于 9月 09, 2011

This function really claims a number of free clusters, not blocks, so
rename it so it's clearer what's going on.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e7d5f315

ext4: rename ext4_free_blocks_after_init() to ext4_free_clusters_after_init() · cff1dfd7

由 Theodore Ts'o 提交于 9月 09, 2011

This function really returns the number of clusters after initializing
an uninitalized block bitmap has been initialized.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

cff1dfd7

ext4: rename ext4_count_free_blocks() to ext4_count_free_clusters() · 5dee5437

由 Theodore Ts'o 提交于 9月 09, 2011

This function really counts the free clusters reported in the block
group descriptors, so rename it to reduce confusion.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5dee5437

ext4: Rename ext4_free_blks_{count,set}() to refer to clusters · 021b65bb

由 Theodore Ts'o 提交于 9月 09, 2011

The field bg_free_blocks_count_{lo,high} in the block group
descriptor has been repurposed to hold the number of free clusters for
bigalloc functions.  So rename the functions so it makes it easier to
read and audit the block allocation and block freeing code.

Note: at this point in bigalloc development we doesn't support
online resize, so this also makes it really obvious all of the places
we need to fix up to add support for online resize.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

021b65bb

ext4: enable mounting bigalloc as read/write · 6f16b606

由 Theodore Ts'o 提交于 9月 09, 2011

Now that we have implemented all of the changes needed for bigalloc,
we can finally enable it!
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6f16b606

ext4: Fix bigalloc quota accounting and i_blocks value · 7b415bf6

由 Aditya Kali 提交于 9月 09, 2011

With bigalloc changes, the i_blocks value was not correctly set (it was still
set to number of blocks being used, but in case of bigalloc, we want i_blocks
to represent the number of clusters being used). Since the quota subsystem sets
the i_blocks value, this patch fixes the quota accounting and makes sure that
the i_blocks value is set correctly.
Signed-off-by: NAditya Kali <adityakali@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7b415bf6

ext4: tune mballoc's default group prealloc size for bigalloc file systems · 27baebb8

由 Theodore Ts'o 提交于 9月 09, 2011

The default group preallocation size had been previously set to 512
blocks/clusters, regardless of the block/cluster size.  This is
probably too big for large cluster sizes.  So adjust the default so
that it is 2 megabytes or 32 clusters, whichever is larger.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

27baebb8

T
ext4: teach ext4_statfs() to deal with clusters if bigalloc is enabled · f975d6bc
由 Theodore Ts'o 提交于 9月 09, 2011
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
f975d6bc

ext4: convert the free_blocks field in s_flex_groups to be free_clusters · 24aaa8ef

由 Theodore Ts'o 提交于 9月 09, 2011

Convert the free_blocks to be free_clusters to make the final revised
bigalloc changes easier to read/understand.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

24aaa8ef

ext4: convert s_{dirty,free}blocks_counter to s_{dirty,free}clusters_counter · 57042651

由 Theodore Ts'o 提交于 9月 09, 2011

Convert the percpu counters s_dirtyblocks_counter and
s_freeblocks_counter in struct ext4_super_info to be
s_dirtyclusters_counter and s_freeclusters_counter.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

57042651

ext4: teach ext4_ext_truncate() about the bigalloc feature · 0aa06000

由 Theodore Ts'o 提交于 9月 09, 2011

When we are truncating (as opposed unlinking) a file, we need to worry
about partial truncates of a file, especially in the light of sparse
files.  The changes here make sure that arbitrary truncates of sparse
files works correctly.  Yeah, it's messy.

Note that these functions will need to be revisted when the punch
ioctl is integrated --- in fact this commit will probably have merge
conflicts with the punch changes which Allison Henders and the IBM LTC
have been working on.  I will need to fix this up when either patch
hits mainline.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0aa06000

ext4: teach ext4_ext_map_blocks() about the bigalloc feature · 4d33b1ef

由 Theodore Ts'o 提交于 9月 09, 2011

If we need to allocate a new block in ext4_ext_map_blocks(), the
function needs to see if the cluster has already been allocated.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4d33b1ef

ext4: teach ext4_free_blocks() about bigalloc and clusters · 84130193

由 Theodore Ts'o 提交于 9月 09, 2011

The ext4_free_blocks() function now has two new flags that indicate
whether a partial cluster at the beginning or the end of the block
extents should be freed or not.  That will be up the caller (i.e.,
truncate), who can figure out whether partial clusters at the
beginning or the end of a block range can be freed.

We also have to update the ext4_mb_free_metadata() and
release_blocks_on_commit() machinery to be cluster-based, since it is
used by ext4_free_blocks().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

84130193

ext4: teach mballoc preallocation code about bigalloc clusters · 53accfa9

由 Theodore Ts'o 提交于 9月 09, 2011

In most of mballoc.c, we do everything in units of clusters, since the
block allocation bitmaps and buddy bitmaps are all denominated in
clusters.  The one place where we do deal with absolute block numbers
is in the code that handles the preallocation regions, since in the
case of inode-based preallocation regions, the start of the
preallocation region can't be relative to the beginning of the group.

So this adds a bit of complexity, where pa_pstart and pa_lstart are
block numbers, while pa_free, pa_len, and fe_len are denominated in
units of clusters.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

53accfa9

ext4: convert block group-relative offsets to use clusters · 3212a80a

由 Theodore Ts'o 提交于 9月 09, 2011

Certain parts of the ext4 code base, primarily in mballoc.c, use a
block group number and offset from the beginning of the block group.
This offset is invariably used to index into the allocation bitmap, so
change the offset to be denominated in units of clusters.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3212a80a

ext4: bigalloc changes to block bitmap initialization functions · d5b8f310

由 Theodore Ts'o 提交于 9月 09, 2011

Add bigalloc support to ext4_init_block_bitmap() and
ext4_free_blocks_after_init().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d5b8f310

ext4: split out ext4_free_blocks_after_init() · fd034a84

由 Theodore Ts'o 提交于 9月 09, 2011

The function ext4_free_blocks_after_init() used to be a #define of
ext4_init_block_bitmap().  This actually made it difficult to
understand how the function worked, and made it hard make changes to
support clusters.  So as an initial cleanup, I've separated out the
functionality of initializing block bitmap from calculating the number
of free blocks in the new block group.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fd034a84

ext4: factor out block group accounting into functions · 49f7f9af

由 Theodore Ts'o 提交于 9月 09, 2011

This makes it easier to understand how ext4_init_block_bitmap() works,
and it will assist when we split out ext4_free_blocks_after_init() in
the next commit.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

49f7f9af

ext4: convert instances of EXT4_BLOCKS_PER_GROUP to EXT4_CLUSTERS_PER_GROUP · 7137d7a4

由 Theodore Ts'o 提交于 9月 09, 2011

Change the places in fs/ext4/mballoc.c where EXT4_BLOCKS_PER_GROUP are
used to indicate the number of bits in a block bitmap (which is really
a cluster allocation bitmap in bigalloc file systems). There are
still some places in the ext4 codebase where usage of
EXT4_BLOCKS_PER_GROUP needs to be audited/fixed, in code paths that
aren't used given the initial restricted assumptions for bigalloc.
These will need to be fixed before we can relax those restrictions.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7137d7a4

ext4: enforce bigalloc restrictions (e.g., no online resizing, etc.) · bab08ab9

由 Theodore Ts'o 提交于 9月 09, 2011

At least initially if the bigalloc feature is enabled, we will not
support non-extent mapped inodes, online resizing, online defrag, or
the FITRIM ioctl.  This simplifies the initial implementation.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bab08ab9

ext4: read-only support for bigalloc file systems · 281b5995

由 Theodore Ts'o 提交于 9月 09, 2011

This adds supports for bigalloc file systems.  It teaches the mount
code just enough about bigalloc superblock fields that it will mount
the file system without freaking out that the number of blocks per
group is too big.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

281b5995

ext4: add ext4-specific kludge to avoid an oops after the disk disappears · 7c2e7087

由 Theodore Ts'o 提交于 9月 09, 2011

The del_gendisk() function uninitializes the disk-specific data
structures, including the bdi structure, without telling anyone
else.  Once this happens, any attempt to call mark_buffer_dirty()
(for example, by ext4_commit_super), will cause a kernel OOPS.

Fix this for now until we can fix things in an architecturally correct
way.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7c2e7087

07 9月, 2011 2 次提交

ext4: fix partial page writes · 02fac129

由 Allison Henderson 提交于 9月 06, 2011

While running extended fsx tests to verify the preceeding patches,
a similar bug was also found in the write operation

When ever a write operation begins or ends in a hole,
or extends EOF, the partial page contained in the hole
or beyond EOF needs to be zeroed out.

To correct this the new ext4_discard_partial_page_buffers_no_lock
routine is used to zero out the partial page, but only for buffer
heads that are already unmapped.
Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

02fac129

ext4: fix fsx truncate failure · 189e868f

由 Allison Henderson 提交于 9月 06, 2011

While running extended fsx tests to verify the first
two patches, a similar bug was also found in the
truncate operation.

This bug happens because the truncate routine only zeros
the unblock aligned portion of the last page.  This means
that the block aligned portions of the page appearing after
i_size are left unzeroed, and the buffer heads still mapped.

This bug is corrected by using ext4_discard_partial_page_buffers
in the truncate routine to zero the partial page and unmap
the buffer headers.
Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

189e868f

06 9月, 2011 1 次提交

ext4: only call ext4_jbd2_file_inode when an inode has been extended · decbd919

由 Theodore Ts'o 提交于 9月 06, 2011

In delayed allocation mode, it's important to only call
ext4_jbd2_file_inode when the file has been extended.  This is
necessary to avoid a race which first got introduced in commit
678aaf48, but which was made much more common with the introduction
of the "punch hole" functionality.  (Especially when dioread_nolock
was enabled; when I could reliably reproduce this problem with
xfstests #74.)

The race is this: If while trying to writeback a delayed allocation
inode, there is a need to map delalloc blocks, and we run out of space
in the journal, *and* at the same time the inode is already on the
committing transaction's t_inode_list (because for example while doing
the punch hole operation, ext4_jbd2_file_inode() is called), then the
commit operation will wait for the inode to finish all of its pending
writebacks by calling filemap_fdatawait(), but since that inode has
one or more pages with the PageWriteback flag set, the commit
operation will wait forever, and the so the writeback of the inode can
never take place, and the kjournald thread and the writeback thread
end up waiting for each other --- forever.

It's important at this point to recall why an inode is placed on the
t_inode_list; it is to provide the data=ordered guarantees that we
don't end up exposing stale data.  In the case where we are truncating
or punching a hole in the inode, there is no possibility that stale
data could be exposed in the first place, so we don't need to put the
inode on the t_inode_list!

The right long-term fix is to get rid of data=ordered mode altogether,
and only update the extent tree or indirect blocks after the data has
been written.  Until then, this change will also avoid some
unnecessary waiting in the commit operation.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Allison Henderson <achender@linux.vnet.ibm.com>
Cc: Jan Kara <jack@suse.cz>

decbd919

04 9月, 2011 3 次提交

jbd2: use gfp_t instead of int · d2159fb7

由 Dan Carpenter 提交于 9月 04, 2011

This silences some Sparse warnings:
fs/jbd2/transaction.c:135:69: warning: incorrect type in argument 2 (different base types)
fs/jbd2/transaction.c:135:69:    expected restricted gfp_t [usertype] flags
fs/jbd2/transaction.c:135:69:    got int [signed] gfp_mask
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d2159fb7

jbd2: add debugging information to jbd2_journal_dirty_metadata() · 9ea7a0df

由 Theodore Ts'o 提交于 9月 04, 2011

Add debugging information in case jbd2_journal_dirty_metadata() is
called with a buffer_head which didn't have
jbd2_journal_get_write_access() called on it, or if the journal_head
has the wrong transaction in it.  In addition, return an error code.
This won't change anything for ocfs2, which will BUG_ON() the non-zero
exit code.

For ext4, the caller of this function is ext4_handle_dirty_metadata(),
and on seeing a non-zero return code, will call __ext4_journal_stop(),
which will print the function and line number of the (buggy) calling
function and abort the journal.  This will allow us to recover instead
of bug halting, which is better from a robustness and reliability
point of view.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9ea7a0df

ext4: improve handling of conflicting mount options · 56889787

由 Theodore Ts'o 提交于 9月 03, 2011

If the user explicitly specifies conflicting mount options for
delalloc or dioread_nolock and data=journal, fail the mount, instead
of printing a warning and continuing (since many user's won't look at
dmesg and notice the warning).

Also, print a single warning that data=journal implies that delayed
allocation is not on by default (since it's not supported), and
furthermore that O_DIRECT is not supported.  Improve the text in
Documentation/filesystems/ext4.txt so this is clear there as well.

Similarly, if the dioread_nolock mount option is specified when the
file system block size != PAGE_SIZE, fail the mount instead of
printing a warning message and ignoring the mount option.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

56889787

03 9月, 2011 1 次提交

ext4: fix 2nd xfstests 127 punch hole failure · 2be4751b

由 Allison Henderson 提交于 9月 03, 2011

This patch fixes a second punch hole bug found by xfstests 127.

This bug happens because punch hole needs to flush the pages
of the hole to avoid race conditions.  But if the end of the
hole is in the same page as i_size, the buffer heads beyond
i_size need to be unmapped and the page needs to be zeroed
after it is flushed.

To correct this, the new ext4_discard_partial_page_buffers
routine is used to zero and unmap the partial page
beyond i_size if the end of the hole appears in the same
page as i_size.

The code has also been optimized to set the end of the hole
to the page after i_size if the specified hole exceeds i_size,
and the code that flushes the pages has been simplified.
Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>

2be4751b

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功