提交 · 5b552ad70c6197e764ffe6070089c5b355fe2d26 · openeuler / Kernel

07 11月, 2020 2 次提交

ext4: drop redundant calls ext4_fc_track_range · 5b552ad7

由 Harshad Shirwadkar 提交于 11月 05, 2020

ext4_fc_track_range() should only be called when blocks are added or
removed from an inode. So, the only places from where we need to call
this function are ext4_map_blocks(), punch hole, collapse / zero
range, truncate. Remove all the other redundant calls to ths function.
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201106035911.1942128-4-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

5b552ad7

ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTA · 174fe5ba

由 Kaixu Xia 提交于 10月 29, 2020

The macro MOPT_Q is used to indicates the mount option is related to
quota stuff and is defined to be MOPT_NOSUPPORT when CONFIG_QUOTA is
disabled.  Normally the quota options are handled explicitly, so it
didn't matter that the MOPT_STRING flag was missing, even though the
usrjquota and grpjquota mount options take a string argument.  It's
important that's present in the !CONFIG_QUOTA case, since without
MOPT_STRING, the mount option matcher will match usrjquota= followed
by an integer, and will otherwise skip the table entry, and so "mount
option not supported" error message is never reported.

[ Fixed up the commit description to better explain why the fix
  works. --TYT ]

Fixes: 26092bf5 ("ext4: use a table-driven handler for mount options")
Signed-off-by: NKaixu Xia <kaixuxia@tencent.com>
Link: https://lore.kernel.org/r/1603986396-28917-1-git-send-email-kaixuxia@tencent.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

174fe5ba

29 10月, 2020 2 次提交

ext4: use generic casefolding support · f8f4acb6

由 Daniel Rosenberg 提交于 10月 28, 2020

This switches ext4 over to the generic support provided in libfs.

Since casefolded dentries behave the same in ext4 and f2fs, we decrease
the maintenance burden by unifying them, and any optimizations will
immediately apply to both.
Signed-off-by: NDaniel Rosenberg <drosen@google.com>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20201028050820.1636571-1-drosen@google.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

f8f4acb6

ext4: use s_mount_flags instead of s_mount_state for fast commit state · ababea77

由 Harshad Shirwadkar 提交于 10月 26, 2020

Ext4's fast commit related transient states should use
sb->s_mount_flags instead of persistent sb->s_mount_state.

Fixes: 8016e29f ("ext4: fast commit recovery path")
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201027044915.2553163-3-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

ababea77

22 10月, 2020 5 次提交

ext4: add a mount opt to forcefully turn fast commits on · 0f0672ff

由 Harshad Shirwadkar 提交于 10月 15, 2020

This is a debug only mount option that forcefully turns fast commits
on at mount time.
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-9-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

0f0672ff

ext4: fast commit recovery path · 8016e29f

由 Harshad Shirwadkar 提交于 10月 15, 2020

This patch adds fast commit recovery path support for Ext4 file
system. We add several helper functions that are similar in spirit to
e2fsprogs journal recovery path handlers. Example of such functions
include - a simple block allocator, idempotent block bitmap update
function etc. Using these routines and the fast commit log in the fast
commit area, the recovery path (ext4_fc_replay()) performs fast commit
log recovery.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-8-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

8016e29f

ext4: main fast-commit commit path · aa75f4d3

由 Harshad Shirwadkar 提交于 10月 15, 2020

This patch adds main fast commit commit path handlers. The overall
patch can be divided into two inter-related parts:

(A) Metadata updates tracking

    This part consists of helper functions to track changes that need
    to be committed during a commit operation. These updates are
    maintained by Ext4 in different in-memory queues. Following are
    the APIs and their short description that are implemented in this
    patch:

    - ext4_fc_track_link/unlink/creat() - Track unlink. link and creat
      operations
    - ext4_fc_track_range() - Track changed logical block offsets
      inodes
    - ext4_fc_track_inode() - Track inodes
    - ext4_fc_mark_ineligible() - Mark file system fast commit
      ineligible()
    - ext4_fc_start_update() / ext4_fc_stop_update() /
      ext4_fc_start_ineligible() / ext4_fc_stop_ineligible() These
      functions are useful for co-ordinating inode updates with
      commits.

(B) Main commit Path

    This part consists of functions to convert updates tracked in
    in-memory data structures into on-disk commits. Function
    ext4_fc_commit() is the main entry point to commit path.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-6-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

aa75f4d3

ext4 / jbd2: add fast commit initialization · 6866d7b3

由 Harshad Shirwadkar 提交于 10月 15, 2020

This patch adds fast commit area trackers in the journal_t
structure. These are initialized via the jbd2_fc_init() routine that
this patch adds. This patch also adds ext4/fast_commit.c and
ext4/fast_commit.h files for fast commit code that will be added in
subsequent patches in this series.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-4-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

6866d7b3

ext4: add fast_commit feature and handling for extended mount options · 995a3ed6

由 Harshad Shirwadkar 提交于 10月 15, 2020

We are running out of mount option bits. Add handling for using
s_mount_opt2. Add ext4 and jbd2 fast commit feature flag and also add
ability to turn off the fast commit feature in Ext4.
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-3-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

995a3ed6

18 10月, 2020 13 次提交

ext4: Detect already used quota file early · e0770e91

由 Jan Kara 提交于 10月 15, 2020

When we try to use file already used as a quota file again (for the same
or different quota type), strange things can happen. At the very least
lockdep annotations may be wrong but also inode flags may be wrongly set
/ reset. When the file is used for two quota types at once we can even
corrupt the file and likely crash the kernel. Catch all these cases by
checking whether passed file is already used as quota file and bail
early in that case.

This fixes occasional generic/219 failure due to lockdep complaint.
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reported-by: NRitesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201015110330.28716-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

e0770e91

ext4: data=journal: write-protect pages on j_submit_inode_data_buffers() · afb585a9

由 Mauricio Faria de Oliveira 提交于 10月 05, 2020

This implements journal callbacks j_submit|finish_inode_data_buffers()
with different behavior for data=journal: to write-protect pages under
commit, preventing changes to buffers writeably mapped to userspace.

If a buffer's content changes between commit's checksum calculation
and write-out to disk, it can cause journal recovery/mount failures
upon a kernel crash or power loss.

    [   27.334874] EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, and O_DIRECT support!
    [   27.339492] JBD2: Invalid checksum recovering data block 8705 in log
    [   27.342716] JBD2: recovery failed
    [   27.343316] EXT4-fs (loop0): error loading journal
    mount: /ext4: can't read superblock on /dev/loop0.

In j_submit_inode_data_buffers() we write-protect the inode's pages
with write_cache_pages() and redirty w/ writepage callback if needed.

In j_finish_inode_data_buffers() there is nothing do to.

And in order to use the callbacks, inodes are added to the inode list
in transaction in __ext4_journalled_writepage() and ext4_page_mkwrite().

In ext4_page_mkwrite() we must make sure that the buffers are attached
to the transaction as jbddirty with write_end_fn(), as already done in
__ext4_journalled_writepage().
Signed-off-by: NMauricio Faria de Oliveira <mfo@canonical.com>
Reported-by: NDann Frazier <dann.frazier@canonical.com>
Reported-by: kernel test robot <lkp@intel.com> # wbc.nr_to_write
Suggested-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201006004841.600488-5-mfo@canonical.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

afb585a9

jbd2, ext4, ocfs2: introduce/use journal callbacks j_submit|finish_inode_data_buffers() · 342af94e

由 Mauricio Faria de Oliveira 提交于 10月 05, 2020

Introduce journal callbacks to allow different behaviors
for an inode in journal_submit|finish_inode_data_buffers().

The existing users of the current behavior (ext4, ocfs2)
are adapted to use the previously exported functions
that implement the current behavior.

Users are callers of jbd2_journal_inode_ranged_write|wait(),
which adds the inode to the transaction's inode list with
the JI_WRITE|WAIT_DATA flags. Only ext4 and ocfs2 in-tree.

Both CONFIG_EXT4_FS and CONFIG_OCSFS2_FS select CONFIG_JBD2,
which builds fs/jbd2/commit.c and journal.c that define and
export the functions, so we can call directly in ext4/ocfs2.
Signed-off-by: NMauricio Faria de Oliveira <mfo@canonical.com>
Suggested-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20201006004841.600488-3-mfo@canonical.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

342af94e

ext4: introduce ext4_sb_bread_unmovable() to replace sb_bread_unmovable() · 8394a6ab

由 zhangyi (F) 提交于 9月 24, 2020

Now we only use sb_bread_unmovable() to read superblock and descriptor
block at mount time, so there is no opportunity that we need to clear
buffer verified bit and also handle buffer write_io error bit. But for
the sake of unification, let's introduce ext4_sb_bread_unmovable() to
replace all sb_bread_unmovable(). After this patch, we stop using read
helpers in fs/buffer.c.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20200924073337.861472-8-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

8394a6ab

ext4: introduce ext4_sb_breadahead_unmovable() to replace sb_breadahead_unmovable() · 5df1d412

由 zhangyi (F) 提交于 9月 24, 2020

If we readahead inode tables in __ext4_get_inode_loc(), it may bypass
buffer_write_io_error() check, so introduce ext4_sb_breadahead_unmovable()
to handle this special case.

This patch also replace sb_breadahead_unmovable() in ext4_fill_super()
for the sake of unification.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20200924073337.861472-6-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

5df1d412

ext4: use common helpers in all places reading metadata buffers · 2d069c08

由 zhangyi (F) 提交于 9月 24, 2020

Revome all open codes that read metadata buffers, switch to use
ext4_read_bh_*() common helpers.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Suggested-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200924073337.861472-4-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

2d069c08

ext4: introduce new metadata buffer read helpers · fa491b14

由 zhangyi (F) 提交于 9月 24, 2020

The previous patch add clear_buffer_verified() before we read metadata
block from disk again, but it's rather easy to miss clearing of this bit
because currently we read metadata buffer through different open codes
(e.g. ll_rw_block(), bh_submit_read() and invoke submit_bh() directly).
So, it's time to add common helpers to unify in all the places reading
metadata buffers instead. This patch add 3 helpers:

 - ext4_read_bh_nowait(): async read metadata buffer if it's actually
   not uptodate, clear buffer_verified bit before read from disk.
 - ext4_read_bh(): sync version of read metadata buffer, it will wait
   until the read operation return and check the return status.
 - ext4_read_bh_lock(): try to lock the buffer before read buffer, it
   will skip reading if the buffer is already locked.

After this patch, we need to use these helpers in all the places reading
metadata buffer instead of different open codes.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Suggested-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200924073337.861472-3-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

fa491b14

ext4: clear buffer verified flag if read meta block from disk · d9befeda

由 zhangyi (F) 提交于 9月 24, 2020

The metadata buffer is no longer trusted after we read it from disk
again because it is not uptodate for some reasons (e.g. failed to write
back). Otherwise we may get below memory corruption problem in
ext4_ext_split()->memset() if we read stale data from the newly
allocated extent block on disk which has been failed to async write
out but miss verify again since the verified bit has already been set
on the buffer.

[   29.774674] BUG: unable to handle kernel paging request at ffff88841949d000
...
[   29.783317] Oops: 0002 [#2] SMP
[   29.784219] R10: 00000000000f4240 R11: 0000000000002e28 R12: ffff88842fa1c800
[   29.784627] CPU: 1 PID: 126 Comm: kworker/u4:3 Tainted: G      D W
[   29.785546] R13: ffffffff9cddcc20 R14: ffffffff9cddd420 R15: ffff88842fa1c2f8
[   29.786679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS ?-20190727_0738364
[   29.787588] FS:  0000000000000000(0000) GS:ffff88842fa00000(0000) knlGS:0000000000000000
[   29.789288] Workqueue: writeback wb_workfn
[   29.790319] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   29.790321]  (flush-8:0)
[   29.790844] CR2: 0000000000000008 CR3: 00000004234f2000 CR4: 00000000000006f0
[   29.791924] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   29.792839] RIP: 0010:__memset+0x24/0x30
[   29.793739] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   29.794256] Code: 90 90 90 90 90 90 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 033
[   29.795161] Kernel panic - not syncing: Fatal exception in interrupt
...
[   29.808149] Call Trace:
[   29.808475]  ext4_ext_insert_extent+0x102e/0x1be0
[   29.809085]  ext4_ext_map_blocks+0xa89/0x1bb0
[   29.809652]  ext4_map_blocks+0x290/0x8a0
[   29.809085]  ext4_ext_map_blocks+0xa89/0x1bb0
[   29.809652]  ext4_map_blocks+0x290/0x8a0
[   29.810161]  ext4_writepages+0xc85/0x17c0
...

Fix this by clearing buffer's verified bit if we read meta block from
disk again.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200924073337.861472-2-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

d9befeda

ext4: fix bdev write error check failed when mount fs with ro · 9704a322

由 Zhang Xiaoxu 提交于 9月 27, 2020

Consider a situation when a filesystem was uncleanly shutdown and the
orphan list is not empty and a read-only mount is attempted. The orphan
list cleanup during mount will fail with:

ext4_check_bdev_write_error:193: comm mount: Error while async write back metadata

This happens because sbi->s_bdev_wb_err is not initialized when mounting
the filesystem in read only mode and so ext4_check_bdev_write_error()
falsely triggers.

Initialize sbi->s_bdev_wb_err unconditionally to avoid this problem.

Fixes: bc71726c ("ext4: abort the filesystem if failed to async write metadata buffer")
Cc: stable@kernel.org
Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200928020556.710971-1-zhangxiaoxu5@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

9704a322

ext4: rename system_blks to s_system_blks inside ext4_sb_info · dd0db94f

由 Chunguang Xu 提交于 9月 24, 2020

Rename system_blks to s_system_blks inside ext4_sb_info, keep
the naming rules consistent with other variables, which is
convenient for code reading and writing.
Signed-off-by: NChunguang Xu <brookxu@tencent.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/1600916623-544-2-git-send-email-brookxu@tencent.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

dd0db94f

ext4: rename journal_dev to s_journal_dev inside ext4_sb_info · ee7ed3aa

由 Chunguang Xu 提交于 9月 24, 2020

Rename journal_dev to s_journal_dev inside ext4_sb_info, keep
the naming rules consistent with other variables, which is
convenient for code reading and writing.
Signed-off-by: NChunguang Xu <brookxu@tencent.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/1600916623-544-1-git-send-email-brookxu@tencent.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

ee7ed3aa

ext4: fix superblock checksum calculation race · acaa5326

由 Constantine Sapuntzakis 提交于 9月 14, 2020

The race condition could cause the persisted superblock checksum
to not match the contents of the superblock, causing the
superblock to be considered corrupt.

An example of the race follows.  A first thread is interrupted in the
middle of a checksum calculation. Then, another thread changes the
superblock, calculates a new checksum, and sets it. Then, the first
thread resumes and sets the checksum based on the older superblock.

To fix, serialize the superblock checksum calculation using the buffer
header lock. While a spinlock is sufficient, the buffer header is
already there and there is precedent for locking it (e.g. in
ext4_commit_super).

Tested the patch by booting up a kernel with the patch, creating
a filesystem and some files (including some orphans), and then
unmounting and remounting the file system.

Cc: stable@kernel.org
Signed-off-by: NConstantine Sapuntzakis <costa@purestorage.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Suggested-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200914161014.22275-1-costa@purestorage.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

acaa5326

ext4: fix leaking sysfs kobject after failed mount · cb8d53d2

由 Eric Biggers 提交于 9月 22, 2020

ext4_unregister_sysfs() only deletes the kobject.  The reference to it
needs to be put separately, like ext4_put_super() does.

This addresses the syzbot report
"memory leak in kobject_set_name_vargs (3)"
(https://syzkaller.appspot.com/bug?extid=9f864abad79fae7c17e1).

Reported-by: syzbot+9f864abad79fae7c17e1@syzkaller.appspotmail.com
Fixes: 72ba7450 ("ext4: release sysfs kobject when failing to enable quotas on mount")
Cc: stable@vger.kernel.org
Signed-off-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20200922162456.93657-1-ebiggers@kernel.orgReviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

cb8d53d2

22 9月, 2020 2 次提交

fscrypt: make fscrypt_set_test_dummy_encryption() take a 'const char *' · c8c868ab

由 Eric Biggers 提交于 9月 16, 2020

fscrypt_set_test_dummy_encryption() requires that the optional argument
to the test_dummy_encryption mount option be specified as a substring_t.
That doesn't work well with filesystems that use the new mount API,
since the new way of parsing mount options doesn't use substring_t.

Make it take the argument as a 'const char *' instead.

Instead of moving the match_strdup() into the callers in ext4 and f2fs,
make them just use arg->from directly. Since the pattern is
"test_dummy_encryption=%s", the argument will be null-terminated.
Acked-by: NJeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-14-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>

c8c868ab

fscrypt: handle test_dummy_encryption in more logical way · ac4acb1f

由 Eric Biggers 提交于 9月 16, 2020

The behavior of the test_dummy_encryption mount option is that when a
new file (or directory or symlink) is created in an unencrypted
directory, it's automatically encrypted using a dummy encryption policy.
That's it; in particular, the encryption (or lack thereof) of existing
files (or directories or symlinks) doesn't change.

Unfortunately the implementation of test_dummy_encryption is a bit weird
and confusing.  When test_dummy_encryption is enabled and a file is
being created in an unencrypted directory, we set up an encryption key
(->i_crypt_info) for the directory.  This isn't actually used to do any
encryption, however, since the directory is still unencrypted!  Instead,
->i_crypt_info is only used for inheriting the encryption policy.

One consequence of this is that the filesystem ends up providing a
"dummy context" (policy + nonce) instead of a "dummy policy".  In
commit ed318a6c ("fscrypt: support test_dummy_encryption=v2"), I
mistakenly thought this was required.  However, actually the nonce only
ends up being used to derive a key that is never used.

Another consequence of this implementation is that it allows for
'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
case that can be forgotten about.  For example, currently
FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
dummy encryption policy when the filesystem is mounted with
test_dummy_encryption.  That seems like the wrong thing to do, since
again, the directory itself is not actually encrypted.

Therefore, switch to a more logical and maintainable implementation
where the dummy encryption policy inheritance is done without setting up
keys for unencrypted directories.  This involves:

- Adding a function fscrypt_policy_to_inherit() which returns the
  encryption policy to inherit from a directory.  This can be a real
  policy, a dummy policy, or no policy.

- Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
  with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.

- Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
  of an inode.
Acked-by: NJaegeuk Kim <jaegeuk@kernel.org>
Acked-by: NJeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>

ac4acb1f

19 9月, 2020 1 次提交

[PATCH] reduce boilerplate in fsid handling · 6d1349c7

由 Al Viro 提交于 9月 18, 2020

Get rid of boilerplate in most of ->statfs()
instances...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6d1349c7

20 8月, 2020 1 次提交

ext4: limit the length of per-inode prealloc list · 27bc446e

由 brookxu 提交于 8月 17, 2020

In the scenario of writing sparse files, the per-inode prealloc list may
be very long, resulting in high overhead for ext4_mb_use_preallocated().
To circumvent this problem, we limit the maximum length of per-inode
prealloc list to 512 and allow users to modify it.

After patching, we observed that the sys ratio of cpu has dropped, and
the system throughput has increased significantly. We created a process
to write the sparse file, and the running time of the process on the
fixed kernel was significantly reduced, as follows:

Running time on unfixed kernel：
[root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
real    0m2.051s
user    0m0.008s
sys     0m2.026s

Running time on fixed kernel：
[root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
real    0m0.471s
user    0m0.004s
sys     0m0.395s
Signed-off-by: NChunguang Xu <brookxu@tencent.com>
Link: https://lore.kernel.org/r/d7a98178-056b-6db5-6bce-4ead23f4a257@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

27bc446e

08 8月, 2020 8 次提交

fs: prevent BUG_ON in submit_bh_wbc() · 377254b2

由 Xianting Tian 提交于 7月 31, 2020

If a device is hot-removed --- for example, when a physical device is
unplugged from pcie slot or a nbd device's network is shutdown ---
this can result in a BUG_ON() crash in submit_bh_wbc().  This is
because the when the block device dies, the buffer heads will have
their Buffer_Mapped flag get cleared, leading to the crash in
submit_bh_wbc.

We had attempted to work around this problem in commit a17712c8
("ext4: check superblock mapped prior to committing").  Unfortunately,
it's still possible to hit the BUG_ON(!buffer_mapped(bh)) if the
device dies between when the work-around check in ext4_commit_super()
and when submit_bh_wbh() is finally called:

Code path:
ext4_commit_super
    judge if 'buffer_mapped(sbh)' is false, return <== commit a17712c8
          lock_buffer(sbh)
          ...
          unlock_buffer(sbh)
               __sync_dirty_buffer(sbh,...
                    lock_buffer(sbh)
                        judge if 'buffer_mapped(sbh))' is false, return <== added by this patch
                            submit_bh(...,sbh)
                                submit_bh_wbc(...,sbh,...)

[100722.966497] kernel BUG at fs/buffer.c:3095! <== BUG_ON(!buffer_mapped(bh))' in submit_bh_wbc()
[100722.966503] invalid opcode: 0000 [#1] SMP
[100722.966566] task: ffff8817e15a9e40 task.stack: ffffc90024744000
[100722.966574] RIP: 0010:submit_bh_wbc+0x180/0x190
[100722.966575] RSP: 0018:ffffc90024747a90 EFLAGS: 00010246
[100722.966576] RAX: 0000000000620005 RBX: ffff8818a80603a8 RCX: 0000000000000000
[100722.966576] RDX: ffff8818a80603a8 RSI: 0000000000020800 RDI: 0000000000000001
[100722.966577] RBP: ffffc90024747ac0 R08: 0000000000000000 R09: ffff88207f94170d
[100722.966578] R10: 00000000000437c8 R11: 0000000000000001 R12: 0000000000020800
[100722.966578] R13: 0000000000000001 R14: 000000000bf9a438 R15: ffff88195f333000
[100722.966580] FS:  00007fa2eee27700(0000) GS:ffff88203d840000(0000) knlGS:0000000000000000
[100722.966580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[100722.966581] CR2: 0000000000f0b008 CR3: 000000201a622003 CR4: 00000000007606e0
[100722.966582] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[100722.966583] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[100722.966583] PKRU: 55555554
[100722.966583] Call Trace:
[100722.966588]  __sync_dirty_buffer+0x6e/0xd0
[100722.966614]  ext4_commit_super+0x1d8/0x290 [ext4]
[100722.966626]  __ext4_std_error+0x78/0x100 [ext4]
[100722.966635]  ? __ext4_journal_get_write_access+0xca/0x120 [ext4]
[100722.966646]  ext4_reserve_inode_write+0x58/0xb0 [ext4]
[100722.966655]  ? ext4_dirty_inode+0x48/0x70 [ext4]
[100722.966663]  ext4_mark_inode_dirty+0x53/0x1e0 [ext4]
[100722.966671]  ? __ext4_journal_start_sb+0x6d/0xf0 [ext4]
[100722.966679]  ext4_dirty_inode+0x48/0x70 [ext4]
[100722.966682]  __mark_inode_dirty+0x17f/0x350
[100722.966686]  generic_update_time+0x87/0xd0
[100722.966687]  touch_atime+0xa9/0xd0
[100722.966690]  generic_file_read_iter+0xa09/0xcd0
[100722.966694]  ? page_cache_tree_insert+0xb0/0xb0
[100722.966704]  ext4_file_read_iter+0x4a/0x100 [ext4]
[100722.966707]  ? __inode_security_revalidate+0x4f/0x60
[100722.966709]  __vfs_read+0xec/0x160
[100722.966711]  vfs_read+0x8c/0x130
[100722.966712]  SyS_pread64+0x87/0xb0
[100722.966716]  do_syscall_64+0x67/0x1b0
[100722.966719]  entry_SYSCALL64_slow_path+0x25/0x25

To address this, add the check of 'buffer_mapped(bh)' to
__sync_dirty_buffer().  This also has the benefit of fixing this for
other file systems.

With this addition, we can drop the workaround in ext4_commit_supper().

[ Commit description rewritten by tytso. ]
Signed-off-by: NXianting Tian <xianting_tian@126.com>
Link: https://lore.kernel.org/r/1596211825-8750-1-git-send-email-xianting_tian@126.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

377254b2

ext4: correctly restore system zone info when remount fails · 0f5bde1d

由 Jan Kara 提交于 7月 28, 2020

When remounting filesystem fails late during remount handling and
block_validity mount option is also changed during the remount, we fail
to restore system zone information to a state matching the mount option.
This is mostly harmless, just the block validity checking will not match
the situation described by the mount option. Make sure these two are always
consistent.
Reported-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200728130437.7804-7-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

0f5bde1d

ext4: handle error of ext4_setup_system_zone() on remount · d176b1f6

由 Jan Kara 提交于 7月 28, 2020

ext4_setup_system_zone() can fail. Handle the failure in ext4_remount().
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200728130437.7804-2-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

d176b1f6

ext4: export msg_count and warning_count via sysfs · 1cf006ed

由 Dmitry Monakhov 提交于 7月 25, 2020

This numbers can be analized by system automation similar to errors_count.
In ideal world it would be nice to have separate counters for different
log-levels, but this makes this patch too intrusive.
Signed-off-by: NDmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Link: https://lore.kernel.org/r/20200725123313.4467-1-dmtrmonakhov@yandex-team.ruSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

1cf006ed

ext4: handle option set by mount flags correctly · f25391eb

由 Lukas Czerner 提交于 7月 23, 2020

Currently there is a problem with mount options that can be both set by
vfs using mount flags or by a string parsing in ext4.

i_version/iversion options gets lost after remount, for example

$ mount -o i_version /dev/pmem0 /mnt
$ grep pmem0 /proc/self/mountinfo | grep i_version
310 95 259:0 / /mnt rw,relatime shared:163 - ext4 /dev/pmem0 rw,seclabel,i_version
$ mount -o remount,ro /mnt
$ grep pmem0 /proc/self/mountinfo | grep i_version

nolazytime gets ignored by ext4 on remount, for example

$ mount -o lazytime /dev/pmem0 /mnt
$ grep pmem0 /proc/self/mountinfo | grep lazytime
310 95 259:0 / /mnt rw,relatime shared:163 - ext4 /dev/pmem0 rw,lazytime,seclabel
$ mount -o remount,nolazytime /mnt
$ grep pmem0 /proc/self/mountinfo | grep lazytime
310 95 259:0 / /mnt rw,relatime shared:163 - ext4 /dev/pmem0 rw,lazytime,seclabel

Fix it by applying the SB_LAZYTIME and SB_I_VERSION flags from *flags to
s_flags before we parse the option and use the resulting state of the
same flags in *flags at the end of successful remount.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200723150526.19931-1-lczerner@redhat.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

f25391eb

ext4: add prefetch_block_bitmaps mount option · 3d392b26

由 Theodore Ts'o 提交于 7月 17, 2020

For file systems where we can afford to keep the buddy bitmaps cached,
we can speed up initial writes to large file systems by starting to
load the block allocation bitmaps as soon as the file system is
mounted.  This won't work well for _super_ large file systems, or
memory constrained systems, so we only enable this when it is
requested via a mount option.

Addresses-Google-Bug: 159488342
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>

3d392b26

jbd2: remove unused parameter in jbd2_journal_try_to_free_buffers() · 529a781e

由 zhangyi (F) 提交于 6月 20, 2020

Parameter gfp_mask in jbd2_journal_try_to_free_buffers() is no longer
used after commit <536fc240> ("jbd2: clean up
jbd2_journal_try_to_free_buffers()"), so just remove it.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20200620025427.1756360-6-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

529a781e

ext4: abort the filesystem if failed to async write metadata buffer · bc71726c

由 zhangyi (F) 提交于 6月 20, 2020

There is a risk of filesystem inconsistency if we failed to async write
back metadata buffer in the background. Because of current buffer's end
io procedure is handled by end_buffer_async_write() in the block layer,
and it only clear the buffer's uptodate flag and mark the write_io_error
flag, so ext4 cannot detect such failure immediately. In most cases of
getting metadata buffer (e.g. ext4_read_inode_bitmap()), although the
buffer's data is actually uptodate, it may still read data from disk
because the buffer's uptodate flag has been cleared. Finally, it may
lead to on-disk filesystem inconsistency if reading old data from the
disk successfully and write them out again.

This patch detect bdev mapping->wb_err when getting journal's write
access and mark the filesystem error if bdev's mapping->wb_err was
increased, this could prevent further writing and potential
inconsistency.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Suggested-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200620025427.1756360-2-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

bc71726c

06 8月, 2020 2 次提交

ext4: handle read only external journal device · 273108fa

由 Lukas Czerner 提交于 7月 17, 2020

Ext4 uses blkdev_get_by_dev() to get the block_device for journal device
which does check to see if the read-only block device was opened
read-only.

As a result ext4 will hapily proceed mounting the file system with
external journal on read-only device. This is bad as we would not be
able to use the journal leading to errors later on.

Instead of simply failing to mount file system in this case, treat it in
a similar way we treat internal journal on read-only device. Allow to
mount with -o noload in read-only mode.

This can be reproduced easily like this:

mke2fs -F -O journal_dev $JOURNAL_DEV 100M
mkfs.$FSTYPE -F -J device=$JOURNAL_DEV $FS_DEV
blockdev --setro $JOURNAL_DEV
mount $FS_DEV $MNT
touch $MNT/file
umount $MNT

leading to error like this

[ 1307.318713] ------------[ cut here ]------------
[ 1307.323362] generic_make_request: Trying to write to read-only block-device dm-2 (partno 0)
[ 1307.331741] WARNING: CPU: 36 PID: 3224 at block/blk-core.c:855 generic_make_request_checks+0x2c3/0x580
[ 1307.341041] Modules linked in: ext4 mbcache jbd2 rfkill intel_rapl_msr intel_rapl_common isst_if_commd
[ 1307.419445] CPU: 36 PID: 3224 Comm: jbd2/dm-2 Tainted: G        W I       5.8.0-rc5 #2
[ 1307.427359] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 2.3.10 08/15/2019
[ 1307.434932] RIP: 0010:generic_make_request_checks+0x2c3/0x580
[ 1307.440676] Code: 94 03 00 00 48 89 df 48 8d 74 24 08 c6 05 cf 2b 18 01 01 e8 7f a4 ff ff 48 c7 c7 50e
[ 1307.459420] RSP: 0018:ffffc0d70eb5fb48 EFLAGS: 00010286
[ 1307.464646] RAX: 0000000000000000 RBX: ffff9b33b2978300 RCX: 0000000000000000
[ 1307.471780] RDX: ffff9b33e12a81e0 RSI: ffff9b33e1298000 RDI: ffff9b33e1298000
[ 1307.478913] RBP: ffff9b7b9679e0c0 R08: 0000000000000837 R09: 0000000000000024
[ 1307.486044] R10: 0000000000000000 R11: ffffc0d70eb5f9f0 R12: 0000000000000400
[ 1307.493177] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
[ 1307.500308] FS:  0000000000000000(0000) GS:ffff9b33e1280000(0000) knlGS:0000000000000000
[ 1307.508396] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1307.514142] CR2: 000055eaf4109000 CR3: 0000003dee40a006 CR4: 00000000007606e0
[ 1307.521273] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1307.528407] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1307.535538] PKRU: 55555554
[ 1307.538250] Call Trace:
[ 1307.540708]  generic_make_request+0x30/0x340
[ 1307.544985]  submit_bio+0x43/0x190
[ 1307.548393]  ? bio_add_page+0x62/0x90
[ 1307.552068]  submit_bh_wbc+0x16a/0x190
[ 1307.555833]  jbd2_write_superblock+0xec/0x200 [jbd2]
[ 1307.560803]  jbd2_journal_update_sb_log_tail+0x65/0xc0 [jbd2]
[ 1307.566557]  jbd2_journal_commit_transaction+0x2ae/0x1860 [jbd2]
[ 1307.572566]  ? check_preempt_curr+0x7a/0x90
[ 1307.576756]  ? update_curr+0xe1/0x1d0
[ 1307.580421]  ? account_entity_dequeue+0x7b/0xb0
[ 1307.584955]  ? newidle_balance+0x231/0x3d0
[ 1307.589056]  ? __switch_to_asm+0x42/0x70
[ 1307.592986]  ? __switch_to_asm+0x36/0x70
[ 1307.596918]  ? lock_timer_base+0x67/0x80
[ 1307.600851]  kjournald2+0xbd/0x270 [jbd2]
[ 1307.604873]  ? finish_wait+0x80/0x80
[ 1307.608460]  ? commit_timeout+0x10/0x10 [jbd2]
[ 1307.612915]  kthread+0x114/0x130
[ 1307.616152]  ? kthread_park+0x80/0x80
[ 1307.619816]  ret_from_fork+0x22/0x30
[ 1307.623400] ---[ end trace 27490236265b1630 ]---
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20200717090605.2612-1-lczerner@redhat.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

273108fa

ext4: don't BUG on inconsistent journal feature · 11215630

由 Jan Kara 提交于 7月 10, 2020

A customer has reported a BUG_ON in ext4_clear_journal_err() hitting
during an LTP testing. Either this has been caused by a test setup
issue where the filesystem was being overwritten while LTP was mounting
it or the journal replay has overwritten the superblock with invalid
data. In either case it is preferable we don't take the machine down
with a BUG_ON. So handle the situation of unexpectedly missing
has_journal feature more gracefully. We issue warning and fail the mount
in the cases where the race window is narrow and the failed check is
most likely a programming error. In cases where fs corruption is more
likely, we do full ext4_error() handling before failing mount / remount.
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200710140759.18031-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

11215630

09 7月, 2020 1 次提交

ext4: add inline encryption support · 4f74d15f

由 Eric Biggers 提交于 7月 02, 2020

Wire up ext4 to support inline encryption via the helper functions which
fs/crypto/ now provides.  This includes:

- Adding a mount option 'inlinecrypt' which enables inline encryption
  on encrypted files where it can be used.

- Setting the bio_crypt_ctx on bios that will be submitted to an
  inline-encrypted file.

  Note: submit_bh_wbc() in fs/buffer.c also needed to be patched for
  this part, since ext4 sometimes uses ll_rw_block() on file data.

- Not adding logically discontiguous data to bios that will be submitted
  to an inline-encrypted file.

- Not doing filesystem-layer crypto on inline-encrypted files.
Co-developed-by: NSatya Tangirala <satyat@google.com>
Signed-off-by: NSatya Tangirala <satyat@google.com>
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20200702015607.1215430-5-satyat@google.comSigned-off-by: NEric Biggers <ebiggers@google.com>

4f74d15f

13 6月, 2020 1 次提交

ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers · 7b97d868

由 zhangyi (F) 提交于 6月 09, 2020

In the ext4 filesystem with errors=panic, if one process is recording
errno in the superblock when invoking jbd2_journal_abort() due to some
error cases, it could be raced by another __ext4_abort() which is
setting the SB_RDONLY flag but missing panic because errno has not been
recorded.

jbd2_journal_commit_transaction()
 jbd2_journal_abort()
  journal->j_flags |= JBD2_ABORT;
  jbd2_journal_update_sb_errno()
                                    | ext4_journal_check_start()
                                    |  __ext4_abort()
                                    |   sb->s_flags |= SB_RDONLY;
                                    |   if (!JBD2_REC_ERR)
                                    |        return;
  journal->j_flags |= JBD2_REC_ERR;

Finally, it will no longer trigger panic because the filesystem has
already been set read-only. Fix this by introduce j_abort_mutex to make
sure journal abort is completed before panic, and remove JBD2_REC_ERR
flag.

Fixes: 4327ba52 ("ext4, jbd2: ensure entering into panic after recording an error in superblock")
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200609073540.3810702-1-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

7b97d868

11 6月, 2020 2 次提交

ext4: stop overwrite the errcode in ext4_setup_super · 5adaccac

由 yangerkun 提交于 6月 01, 2020

Now the errcode from ext4_commit_super will overwrite EROFS exists in
ext4_setup_super. Actually, no need to call ext4_commit_super since we
will return EROFS. Fix it by goto done directly.

Fixes: c89128a0 ("ext4: handle errors on ext4_commit_super")
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200601073404.3712492-1-yangerkun@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

5adaccac

ext4: avoid race conditions when remounting with options that change dax · 829b37b8

由 Theodore Ts'o 提交于 6月 10, 2020

Trying to change dax mount options when remounting could allow mount
options to be enabled for a small amount of time, and then the mount
option change would be reverted.

In the case of "mount -o remount,dax", this can cause a race where
files would temporarily treated as DAX --- and then not.

Cc: stable@kernel.org
Reported-by: syzbot+bca9799bf129256190da@syzkaller.appspotmail.com
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

829b37b8

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功