提交 · b886ee3e778ec2ad43e276fd378ab492cf6819b7 · openeuler / Kernel

26 4月, 2019 4 次提交

ext4: Support case-insensitive file name lookups · b886ee3e

由 Gabriel Krisman Bertazi 提交于 4月 25, 2019

This patch implements the actual support for case-insensitive file name
lookups in ext4, based on the feature bit and the encoding stored in the
superblock.

A filesystem that has the casefold feature set is able to configure
directories with the +F (EXT4_CASEFOLD_FL) attribute, enabling lookups
to succeed in that directory in a case-insensitive fashion, i.e: match
a directory entry even if the name used by userspace is not a byte per
byte match with the disk name, but is an equivalent case-insensitive
version of the Unicode string.  This operation is called a
case-insensitive file name lookup.

The feature is configured as an inode attribute applied to directories
and inherited by its children.  This attribute can only be enabled on
empty directories for filesystems that support the encoding feature,
thus preventing collision of file names that only differ by case.

* dcache handling:

For a +F directory, Ext4 only stores the first equivalent name dentry
used in the dcache. This is done to prevent unintentional duplication of
dentries in the dcache, while also allowing the VFS code to quickly find
the right entry in the cache despite which equivalent string was used in
a previous lookup, without having to resort to ->lookup().

d_hash() of casefolded directories is implemented as the hash of the
casefolded string, such that we always have a well-known bucket for all
the equivalencies of the same string. d_compare() uses the
utf8_strncasecmp() infrastructure, which handles the comparison of
equivalent, same case, names as well.

For now, negative lookups are not inserted in the dcache, since they
would need to be invalidated anyway, because we can't trust missing file
dentries.  This is bad for performance but requires some leveraging of
the vfs layer to fix.  We can live without that for now, and so does
everyone else.

* on-disk data:

Despite using a specific version of the name as the internal
representation within the dcache, the name stored and fetched from the
disk is a byte-per-byte match with what the user requested, making this
implementation 'name-preserving'. i.e. no actual information is lost
when writing to storage.

DX is supported by modifying the hashes used in +F directories to make
them case/encoding-aware.  The new disk hashes are calculated as the
hash of the full casefolded string, instead of the string directly.
This allows us to efficiently search for file names in the htree without
requiring the user to provide an exact name.

* Dealing with invalid sequences:

By default, when a invalid UTF-8 sequence is identified, ext4 will treat
it as an opaque byte sequence, ignoring the encoding and reverting to
the old behavior for that unique file.  This means that case-insensitive
file name lookup will not work only for that file.  An optional bit can
be set in the superblock telling the filesystem code and userspace tools
to enforce the encoding.  When that optional bit is set, any attempt to
create a file name using an invalid UTF-8 sequence will fail and return
an error to userspace.

* Normalization algorithm:

The UTF-8 algorithms used to compare strings in ext4 is implemented
lives in fs/unicode, and is based on a previous version developed by
SGI.  It implements the Canonical decomposition (NFD) algorithm
described by the Unicode specification 12.1, or higher, combined with
the elimination of ignorable code points (NFDi) and full
case-folding (CF) as documented in fs/unicode/utf8_norm.c.

NFD seems to be the best normalization method for EXT4 because:

  - It has a lower cost than NFC/NFKC (which requires
    decomposing to NFD as an intermediary step)
  - It doesn't eliminate important semantic meaning like
    compatibility decompositions.

Although:

  - This implementation is not completely linguistic accurate, because
  different languages have conflicting rules, which would require the
  specialization of the filesystem to a given locale, which brings all
  sorts of problems for removable media and for users who use more than
  one language.
Signed-off-by: NGabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b886ee3e

ext4: include charset encoding information in the superblock · c83ad55e

由 Gabriel Krisman Bertazi 提交于 4月 25, 2019

Support for encoding is considered an incompatible feature, since it has
potential to create collisions of file names in existing filesystems.
If the feature flag is not enabled, the entire filesystem will operate
on opaque byte sequences, respecting the original behavior.

The s_encoding field stores a magic number indicating the encoding
format and version used globally by file and directory names in the
filesystem. The s_encoding_flags defines policies for using the charset
encoding, like how to handle invalid sequences. The magic number is
mapped to the exact charset table, but the mapping is specific to ext4.
Since we don't have any commitment to support old encodings, the only
encoding I am supporting right now is utf8-12.1.0.

The current implementation prevents the user from enabling encoding and
per-directory encryption on the same filesystem at the same time. The
incompatibility between these features lies in how we do efficient
directory searches when we cannot be sure the encryption of the user
provided fname will match the actual hash stored in the disk without
decrypting every directory entry, because of normalization cases. My
quickest solution is to simply block the concurrent use of these
features for now, and enable it later, once we have a better solution.
Signed-off-by: NGabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c83ad55e

ext4: actually request zeroing of inode table after grow · 310a997f

由 Kirill Tkhai 提交于 4月 25, 2019

It is never possible, that number of block groups decreases,
since only online grow is supported.

But after a growing occured, we have to zero inode tables
for just created new block groups.

Fixes: 19c5246d ("ext4: add new online resize interface")
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@kernel.org

310a997f

ext4: cond_resched in work-heavy group loops · 4b99faa2

由 Khazhismel Kumykov 提交于 4月 25, 2019

Signed-off-by: NKhazhismel Kumykov <khazhy@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>

4b99faa2

25 4月, 2019 2 次提交

ext4: fix use-after-free race with debug_want_extra_isize · 7bc04c5c

由 Barret Rhoden 提交于 4月 25, 2019

When remounting with debug_want_extra_isize, we were not performing the
same checks that we do during a normal mount.  That allowed us to set a
value for s_want_extra_isize that reached outside the s_inode_size.

Fixes: e2b911c5 ("ext4: clean up feature test macros with predicate functions")
Reported-by: syzbot+f584efa0ac7213c226b7@syzkaller.appspotmail.com
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NBarret Rhoden <brho@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

7bc04c5c

ext4: avoid drop reference to iloc.bh twice · 8c380ab4

由 Pan Bian 提交于 4月 25, 2019

The reference to iloc.bh has been dropped in ext4_mark_iloc_dirty.
However, the reference is dropped again if error occurs during
ext4_handle_dirty_metadata, which may result in use-after-free bugs.

Fixes: fb265c9c("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases")
Signed-off-by: NPan Bian <bianpan2016@163.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@kernel.org

8c380ab4

10 4月, 2019 2 次提交

ext4: ignore e_value_offs for xattrs with value-in-ea-inode · e5d01196

由 Theodore Ts'o 提交于 4月 10, 2019

In other places in fs/ext4/xattr.c, if e_value_inum is non-zero, the
code ignores the value in e_value_offs.  The e_value_offs *should* be
zero, but we shouldn't depend upon it, since it might not be true in a
corrupted/fuzzed file system.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202897
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202877Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

e5d01196

ext4: protect journal inode's blocks using block_validity · 345c0dbf

由 Theodore Ts'o 提交于 4月 09, 2019

Add the blocks which belong to the journal inode to block_validity's
system zone so attempts to deallocate or overwrite the journal due a
corrupted file system where the journal blocks are also claimed by
another inode.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202879Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

345c0dbf

08 4月, 2019 1 次提交

ext4: use BUG() instead of BUG_ON(1) · 1e83bc81

由 Arnd Bergmann 提交于 4月 07, 2019

BUG_ON(1) leads to bogus warnings from clang when
CONFIG_PROFILE_ANNOTATED_BRANCHES is set:

 fs/ext4/inode.c:544:4: error: variable 'retval' is used uninitialized whenever 'if' condition is false
      [-Werror,-Wsometimes-uninitialized]
                        BUG_ON(1);
                        ^~~~~~~~~
 include/asm-generic/bug.h:61:36: note: expanded from macro 'BUG_ON'
                                   ^~~~~~~~~~~~~~~~~~~
 include/linux/compiler.h:48:23: note: expanded from macro 'unlikely'
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 fs/ext4/inode.c:591:6: note: uninitialized use occurs here
        if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
            ^~~~~~
 fs/ext4/inode.c:544:4: note: remove the 'if' if its condition is always true
                        BUG_ON(1);
                        ^
 include/asm-generic/bug.h:61:32: note: expanded from macro 'BUG_ON'
                               ^
 fs/ext4/inode.c:502:12: note: initialize the variable 'retval' to silence this warning

Change it to BUG() so clang can see that this code path can never
continue.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NJan Kara <jack@suse.cz>

1e83bc81

07 4月, 2019 3 次提交

ext4: fix prefetchw of NULL page · d454a273

由 Liu Xiang 提交于 4月 07, 2019

In ext4_mpage_readpages(), if the parameter pages is not NULL, another
parameter page is NULL. At the first time prefetchw(&page->flags)
works on NULL. From second time, prefetchw(&page->flags) always works on
the last consumed page. This might do little improvment for handling
current page. So prefetchw() should be called while the page pointer
has just been updated.
Signed-off-by: NLiu Xiang <liu.xiang6@zte.com.cn>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d454a273

ext4: remove incorrect comment for NEXT_ORPHAN() · fe53cbc5

由 Eric Biggers 提交于 4月 06, 2019

The comment above NEXT_ORPHAN() was meant for ext4_encrypted_inode(),
which was moved by commit a7550b30 ("ext4 crypto: migrate into vfs's
crypto engine") but the comment was accidentally left in place.  Since
ext4_encrypted_inode() has now been removed, just remove the comment.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

fe53cbc5

ext4: make sanity check in mballoc more strict · 31562b95

由 Jan Kara 提交于 4月 06, 2019

The sanity check in mb_find_extent() only checked that returned extent
does not extend past blocksize * 8, however it should not extend past
EXT4_CLUSTERS_PER_GROUP(sb). This can happen when clusters_per_group <
blocksize * 8 and the tail of the bitmap is not properly filled by 1s
which happened e.g. when ancient kernels have grown the filesystem.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

31562b95

24 3月, 2019 1 次提交

ext4: prohibit fstrim in norecovery mode · 18915b58

由 Darrick J. Wong 提交于 3月 23, 2019

The ext4 fstrim implementation uses the block bitmaps to find free space
that can be discarded.  If we haven't replayed the journal, the bitmaps
will be stale and we absolutely *cannot* use stale metadata to zap the
underlying storage.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

18915b58

23 3月, 2019 2 次提交

ext4: cleanup bh release code in ext4_ind_remove_space() · 5e86bdda

由 zhangyi (F) 提交于 3月 23, 2019

Currently, we are releasing the indirect buffer where we are done with
it in ext4_ind_remove_space(), so we can see the brelse() and
BUFFER_TRACE() everywhere.  It seems fragile and hard to read, and we
may probably forget to release the buffer some day.  This patch cleans
up the code by putting of the code which releases the buffers to the
end of the function.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

5e86bdda

ext4: brelse all indirect buffer in ext4_ind_remove_space() · 674a2b27

由 zhangyi (F) 提交于 3月 23, 2019

All indirect buffers get by ext4_find_shared() should be released no
mater the branch should be freed or not. But now, we forget to release
the lower depth indirect buffers when removing space from the same
higher depth indirect block. It will lead to buffer leak and futher
more, it may lead to quota information corruption when using old quota,
consider the following case.

 - Create and mount an empty ext4 filesystem without extent and quota
   features,
 - quotacheck and enable the user & group quota,
 - Create some files and write some data to them, and then punch hole
   to some files of them, it may trigger the buffer leak problem
   mentioned above.
 - Disable quota and run quotacheck again, it will create two new
   aquota files and write the checked quota information to them, which
   probably may reuse the freed indirect block(the buffer and page
   cache was not freed) as data block.
 - Enable quota again, it will invoke
   vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
   buffers and pagecache. Unfortunately, because of the buffer of quota
   data block is still referenced, quota code cannot read the up to date
   quota info from the device and lead to quota information corruption.

This problem can be reproduced by xfstests generic/231 on ext3 file
system or ext4 file system without extent and quota features.

This patch fix this problem by releasing the missing indirect buffers,
in ext4_ind_remove_space().
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@kernel.org

674a2b27

15 3月, 2019 6 次提交

ext4: report real fs size after failed resize · 6c732840

由 Lukas Czerner 提交于 3月 15, 2019

Currently when the file system resize using ext4_resize_fs() fails it
will report into log that "resized filesystem to <requested block
count>".  However this may not be true in the case of failure.  Use the
current block count as returned by ext4_blocks_count() to report the
block count.

Additionally, report a warning that "error occurred during file system
resize"
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

6c732840

ext4: add missing brelse() in add_new_gdb_meta_bg() · d64264d6

由 Lukas Czerner 提交于 3月 15, 2019

Currently in add_new_gdb_meta_bg() there is a missing brelse of gdb_bh
in case ext4_journal_get_write_access() fails.
Additionally kvfree() is missing in the same error path. Fix it by
moving the ext4_journal_get_write_access() before the ext4 sb update as
Ted suggested and release n_group_desc and gdb_bh in case it fails.

Fixes: 61a9c11e ("ext4: add missing brelse() add_new_gdb_meta_bg()'s error path")
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d64264d6

ext4: remove useless ext4_pin_inode() · 7cf77140

由 Jason Yan 提交于 3月 14, 2019

This function is never used from the beginning (and is commented out);
let's remove it.
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

7cf77140

ext4: avoid panic during forced reboot · 1dc1097f

由 Jan Kara 提交于 3月 14, 2019

When admin calls "reboot -f" - i.e., does a hard system reboot by
directly calling reboot(2) - ext4 filesystem mounted with errors=panic
can panic the system. This happens because the underlying device gets
disabled without unmounting the filesystem and thus some syscall running
in parallel to reboot(2) can result in the filesystem getting IO errors.

This is somewhat surprising to the users so try improve the behavior by
switching to errors=remount-ro behavior when the system is running
reboot(2).
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

1dc1097f

ext4: fix data corruption caused by unaligned direct AIO · 372a03e0

由 Lukas Czerner 提交于 3月 14, 2019

Ext4 needs to serialize unaligned direct AIO because the zeroing of
partial blocks of two competing unaligned AIOs can result in data
corruption.

However it decides not to serialize if the potentially unaligned aio is
past i_size with the rationale that no pending writes are possible past
i_size. Unfortunately if the i_size is not block aligned and the second
unaligned write lands past i_size, but still into the same block, it has
the potential of corrupting the previous unaligned write to the same
block.

This is (very simplified) reproducer from Frank

    // 41472 = (10 * 4096) + 512
    // 37376 = 41472 - 4096

    ftruncate(fd, 41472);
    io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376);
    io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472);

    io_submit(io_ctx, 1, &iocbs[1]);
    io_submit(io_ctx, 1, &iocbs[2]);

    io_getevents(io_ctx, 2, 2, events, NULL);

Without this patch the 512B range from 40960 up to the start of the
second unaligned write (41472) is going to be zeroed overwriting the data
written by the first write. This is a data corruption.

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
*
0000a000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31

With this patch the data corruption is avoided because we will recognize
the unaligned_aio and wait for the unwritten extent conversion.

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
*
0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
*
0000b200
Reported-by: NFrank Sorenson <fsorenso@redhat.com>
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Fixes: e9e3bcec ("ext4: serialize unaligned asynchronous DIO")
Cc: stable@vger.kernel.org

372a03e0

ext4: fix NULL pointer dereference while journal is aborted · fa30dde3

由 Jiufei Xue 提交于 3月 14, 2019

We see the following NULL pointer dereference while running xfstests
generic/475:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 #10
RIP: 0010:ext4_do_update_inode+0x4ec/0x760
...
Call Trace:
? jbd2_journal_get_write_access+0x42/0x50
? __ext4_journal_get_write_access+0x2c/0x70
? ext4_truncate+0x186/0x3f0
ext4_mark_iloc_dirty+0x61/0x80
ext4_mark_inode_dirty+0x62/0x1b0
ext4_truncate+0x186/0x3f0
? unmap_mapping_pages+0x56/0x100
ext4_setattr+0x817/0x8b0
notify_change+0x1df/0x430
do_truncate+0x5e/0x90
? generic_permission+0x12b/0x1a0

This is triggered because the NULL pointer handle->h_transaction was
dereferenced in function ext4_update_inode_fsync_trans().
I found that the h_transaction was set to NULL in jbd2__journal_restart
but failed to attached to a new transaction while the journal is aborted.

Fix this by checking the handle before updating the inode.

Fixes: b436b9be ("ext4: Wait for proper transaction commit on fsync")
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: stable@kernel.org

fa30dde3

01 3月, 2019 1 次提交

ext4: fix bigalloc cluster freeing when hole punching under load · 7bd75230

由 Eric Whitney 提交于 2月 28, 2019

Ext4 may not free clusters correctly when punching holes in bigalloc
file systems under high load conditions. If it's not possible to
extend and restart the journal in ext4_ext_rm_leaf() when preparing to
remove blocks from a punched region, a retry of the entire punch
operation is triggered in ext4_ext_remove_space(). This causes a
partial cluster to be set to the first cluster in the extent found to
the right of the punched region. However, if the punch operation
prior to the retry had made enough progress to delete one or more
extents and a partial cluster candidate for freeing had already been
recorded, the retry would overwrite the partial cluster. The loss of
this information makes it impossible to correctly free the original
partial cluster in all cases.

This bug can cause generic/476 to fail when run as part of
xfstests-bld's bigalloc and bigalloc_1k test cases. The failure is
reported when e2fsck detects bad iblocks counts greater than expected
in units of whole clusters and also detects a number of negative block
bitmap differences equal to the iblocks discrepancy in cluster units.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

7bd75230

22 2月, 2019 4 次提交

ext4: add sysfs attr /sys/fs/ext4/<disk>/journal_task · bc1d69d6

由 Konstantin Khlebnikov 提交于 2月 21, 2019

This is useful for moving journal thread into cgroup or
for tracing it with ftrace/perf/blktrace.

For now the only way is `pgrep jbd2/$DISK` but this is not reliable:
name may be longer than "comm" limit and any task could mock it.

Attribute shows pid in current pid-namespace or 0 if task is unreachable.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

bc1d69d6

ext4: Change debugging support help prefix from EXT4 to Ext4 · 231fe82b

由 Geert Uytterhoeven 提交于 2月 21, 2019

All other configuration options for the ext* family of file systems use
"Ext%u" instead of "EXT%u".

Fixes: 6ba495e9 ("ext4: Add configurable run-time mballoc debugging")
Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

231fe82b

ext4: fix compile error when using BUFFER_TRACE · ddccb6db

由 zhangyi (F) 提交于 2月 21, 2019

Fix compile error below when using BUFFER_TRACE.

fs/ext4/inode.c: In function ‘ext4_expand_extra_isize’:
fs/ext4/inode.c:5979:19: error: request for member ‘bh’ in something not a structure or union
  BUFFER_TRACE(iloc.bh, "get_write_access");

Fixes: c03b45b8 ("ext4, project: expand inode extra size if possible")
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

ddccb6db

ext4: fix some error pointer dereferences · 7159a986

由 Dan Carpenter 提交于 2月 21, 2019

We can't pass error pointers to brelse().

Fixes: fb265c9c ("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

7159a986

21 2月, 2019 2 次提交

ext4: annotate more implicit fall throughs · 793bc518

由 Mathieu Malaterre 提交于 2月 21, 2019

There is a plan to build the kernel with -Wimplicit-fallthrough and
these places in the code produced warnings (W=1). Fix them up.

This commit remove the following warnings:

fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
Signed-off-by: NMathieu Malaterre <malat@debian.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>

793bc518

ext4: annotate implicit fall throughs · 034f891a

由 Mathieu Malaterre 提交于 2月 21, 2019

There is a plan to build the kernel with -Wimplicit-fallthrough and
these places in the code produced warnings (W=1). Fix them up.

This commit remove the following warnings:

fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
Signed-off-by: NMathieu Malaterre <malat@debian.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>

034f891a

15 2月, 2019 2 次提交

block: allow bio_for_each_segment_all() to iterate over multi-page bvec · 6dc4f100

由 Ming Lei 提交于 2月 15, 2019

This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

Given it is just one mechannical & simple change on all bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6dc4f100

ext4: don't update s_rev_level if not required · c9e716eb

由 Andreas Dilger 提交于 2月 14, 2019

Don't update the superblock s_rev_level during mount if it isn't
actually necessary, only if superblock features are being set by
the kernel.  This was originally added for ext3 since it always
set the INCOMPAT_RECOVER and HAS_JOURNAL features during mount,
but this is not needed since no journal mode was added to ext4.

That will allow Geert to mount his 20-year-old ext2 rev 0.0 m68k
filesystem, as a testament of the backward compatibility of ext4.

Fixes: 0390131b ("ext4: Allow ext4 to run without a journal")
Signed-off-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c9e716eb

12 2月, 2019 1 次提交

ext4: fix crash during online resizing · f96c3ac8

由 Jan Kara 提交于 2月 11, 2019

When computing maximum size of filesystem possible with given number of
group descriptor blocks, we forget to include s_first_data_block into
the number of blocks. Thus for filesystems with non-zero
s_first_data_block it can happen that computed maximum filesystem size
is actually lower than current filesystem size which confuses the code
and eventually leads to a BUG_ON in ext4_alloc_group_tables() hitting on
flex_gd->count == 0. The problem can be reproduced like:

truncate -s 100g /tmp/image
mkfs.ext4 -b 1024 -E resize=262144 /tmp/image 32768
mount -t ext4 -o loop /tmp/image /mnt
resize2fs /dev/loop0 262145
resize2fs /dev/loop0 300000

Fix the problem by properly including s_first_data_block into the
computed number of filesystem blocks.

Fixes: 1c6bd717 "ext4: convert file system to meta_bg if needed..."
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

f96c3ac8

11 2月, 2019 8 次提交

ext4: disallow files with EXT4_JOURNAL_DATA_FL from EXT4_IOC_SWAP_BOOT · 6e589291

由 Theodore Ts'o 提交于 2月 11, 2019

A malicious/clueless root user can use EXT4_IOC_SWAP_BOOT to force a
corner casew which can lead to the file system getting corrupted.
There's no usefulness to allowing this, so just prohibit this case.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

6e589291

ext4: add mask of ext4 flags to swap · abdc644e

由 yangerkun 提交于 2月 11, 2019

The reason is that while swapping two inode, we swap the flags too.
Some flags such as EXT4_JOURNAL_DATA_FL can really confuse the things
since we're not resetting the address operations structure.  The
simplest way to keep things sane is to restrict the flags that can be
swapped.
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

abdc644e

ext4: update quota information while swapping boot loader inode · aa507b5f

由 yangerkun 提交于 2月 11, 2019

While do swap between two inode, they swap i_data without update
quota information. Also, swap_inode_boot_loader can do "revert"
somtimes, so update the quota while all operations has been finished.
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

aa507b5f

ext4: cleanup pagecache before swap i_data · a46c68a3

由 yangerkun 提交于 2月 11, 2019

While do swap, we should make sure there has no new dirty page since we
should swap i_data between two inode:
1.We should lock i_mmap_sem with write to avoid new pagecache from mmap
read/write;
2.Change filemap_flush to filemap_write_and_wait and move them to the
space protected by inode lock to avoid new pagecache from buffer read/write.
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

a46c68a3

ext4: fix check of inode in swap_inode_boot_loader · 67a11611

由 yangerkun 提交于 2月 11, 2019

Before really do swap between inode and boot inode, something need to
check to avoid invalid or not permitted operation, like does this inode
has inline data. But the condition check should be protected by inode
lock to avoid change while swapping. Also some other condition will not
change between swapping, but there has no problem to do this under inode
lock.
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

67a11611

ext4: unlock unused_pages timely when doing writeback · a297b2fc

由 Xiaoguang Wang 提交于 2月 10, 2019

In mpage_add_bh_to_extent(), when accumulated extents length is greater
than MAX_WRITEPAGES_EXTENT_LEN or buffer head's b_stat is not equal, we
will not continue to search unmapped area for this page, but note this
page is locked, and will only be unlocked in mpage_release_unused_pages()
after ext4_io_submit, if io also is throttled by blk-throttle or similar
io qos, we will hold this page locked for a while, it's unnecessary.

I think the best fix is to refactor mpage_add_bh_to_extent() to let it
return some hints whether to unlock this page, but given that we will
improve dioread_nolock later, we can let it done later, so currently
the simple fix would just call mpage_release_unused_pages() before
ext4_io_submit().
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

a297b2fc

ext4: cleanup clean_bdev_aliases() calls · 16e08b14

由 zhangyi (F) 提交于 2月 10, 2019

Now, we have already handle all cases of forgetting buffer in
jbd2_journal_forget(), the buffer should not be mapped to blockdevice
when reallocating it. So this patch remove all clean_bdev_aliases() and
clean_bdev_bh_alias() calls which were invoked by ext4 explicitly.
Suggested-by: NJan Kara <jack@suse.cz>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

16e08b14

ext4: replace opencoded i_writecount usage with inode_is_open_for_write() · 82dd124c

由 Nikolay Borisov 提交于 2月 10, 2019

There is a function which clearly conveys the objective of checking
i_writecount. Additionally the usage in ext4_mb_initialize_context was
wrong, since a node would have wrongfully been reported as writable if
i_writecount had a negative value (MMAP_DENY_WRITE).
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

82dd124c

01 2月, 2019 1 次提交

Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal" · 8fdd60f2

由 Theodore Ts'o 提交于 1月 31, 2019

This reverts commit ad211f3e.

As Jan Kara pointed out, this change was unsafe since it means we lose
the call to sync_mapping_buffers() in the nojournal case.  The
original point of the commit was avoid taking the inode mutex (since
it causes a lockdep warning in generic/113); but we need the mutex in
order to call sync_mapping_buffers().

The real fix to this problem was discussed here:

https://lore.kernel.org/lkml/20181025150540.259281-4-bvanassche@acm.org

The proposed patch was to fix a syzbot complaint, but the problem can
also demonstrated via "kvm-xfstests -c nojournal generic/113".
Multiple solutions were discused in the e-mail thread, but none have
landed in the kernel as of this writing.  Anyway, commit
ad211f3e is absolutely the wrong way to suppress the lockdep, so
revert it.

Fixes: ad211f3e ("ext4: use ext4_write_inode() when fsyncing w/o a journal")
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reported: Jan Kara <jack@suse.cz>

8fdd60f2

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功