提交 · 704c2317cab5571da0e5763cd47ad07f8900aa76 · gsplhtlxg / clone-Linux

20 11月, 2020 1 次提交

ext4: drop fast_commit from /proc/mounts · 704c2317

由 Theodore Ts'o 提交于 11月 19, 2020

The options in /proc/mounts must be valid mount options --- and
fast_commit is not a mount option.  Otherwise, command sequences like
this will fail:

    # mount /dev/vdc /vdc
    # mkdir -p /vdc/phoronix_test_suite /pts
    # mount --bind /vdc/phoronix_test_suite /pts
    # mount -o remount,nodioread_nolock /pts
    mount: /pts: mount point not mounted or bad option.

And in the system logs, you'll find:

    EXT4-fs (vdc): Unrecognized mount option "fast_commit" or missing value

Fixes: 995a3ed6 ("ext4: add fast_commit feature and handling for extended mount options")
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

704c2317

12 11月, 2020 1 次提交

Revert "ext4: fix superblock checksum calculation race" · d196e229

由 Theodore Ts'o 提交于 11月 11, 2020

This reverts commit acaa5326 which can
result in a ext4_superblock_csum_set() trying to sleep while a
spinlock is being held.

For more discussion of this issue, please see:

https://lore.kernel.org/r/000000000000f50cb705b313ed70@google.com

Reported-by: syzbot+7a4ba6a239b91a126c28@syzkaller.appspotmail.com
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d196e229

07 11月, 2020 7 次提交

ext4: cleanup fast commit mount options · 99c880de

由 Harshad Shirwadkar 提交于 11月 05, 2020

Drop no_fc mount option that disable fast commit even if it was
enabled at mkfs time. Move fc_debug_force mount option under ifdef
EXT4_DEBUG to annotate that this is strictly for debugging and testing
purposes and should not be used in production.
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201106035911.1942128-23-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

99c880de

ext4: make s_mount_flags modifications atomic · 9b5f6c9b

由 Harshad Shirwadkar 提交于 11月 05, 2020

Fast commit file system states are recorded in
sbi->s_mount_flags. Fast commit expects these bit manipulations to be
atomic. This patch adds helpers to make those modifications atomic.
Suggested-by: NJan Kara <jack@suse.cz>
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201106035911.1942128-21-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

9b5f6c9b

ext4: disable fast commit with data journalling · 556e0319

由 Harshad Shirwadkar 提交于 11月 05, 2020

Fast commits don't work with data journalling. This patch disables the
fast commit support when data journalling is turned on.
Suggested-by: NJan Kara <jack@suse.cz>
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201106035911.1942128-19-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

556e0319

ext4: clean up the JBD2 API that initializes fast commits · a1e5e465

由 Harshad Shirwadkar 提交于 11月 05, 2020

This patch removes jbd2_fc_init() API and its related functions to
simplify enabling fast commits. With this change, the number of fast
commit blocks to use is solely determined by the JBD2 layer. So, we
move the default value for minimum number of fast commit blocks from
ext4/fast_commit.h to include/linux/jbd2.h. However, whether or not to
use fast commits is determined by the file system. The file system
just sets the fast commit feature using
jbd2_journal_set_features(). JBD2 layer then determines how many
blocks to use for fast commits (based on the value found in the JBD2
superblock).

Note that the JBD2 feature flag of fast commits is just an indication
that there are fast commit blocks present on disk. It doesn't tell
JBD2 layer about the intent of the file system of whether to it wants
to use fast commit or not. That's why, we blindly clear the fast
commit flag in journal_reset() after the recovery is done.
Suggested-by: NJan Kara <jack@suse.cz>
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201106035911.1942128-7-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

a1e5e465

jbd2: rename j_maxlen to j_total_len and add jbd2_journal_max_txn_bufs · ede7dc7f

由 Harshad Shirwadkar 提交于 11月 05, 2020

The on-disk superblock field sb->s_maxlen represents the total size of
the journal including the fast commit area and is no more the max
number of blocks available for a transaction. The maximum number of
blocks available to a transaction is reduced by the number of fast
commit blocks. So, this patch renames j_maxlen to j_total_len to
better represent its intent. Also, it adds a function to calculate max
number of bufs available for a transaction.
Suggested-by: NJan Kara <jack@suse.cz>
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201106035911.1942128-6-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

ede7dc7f

ext4: drop redundant calls ext4_fc_track_range · 5b552ad7

由 Harshad Shirwadkar 提交于 11月 05, 2020

ext4_fc_track_range() should only be called when blocks are added or
removed from an inode. So, the only places from where we need to call
this function are ext4_map_blocks(), punch hole, collapse / zero
range, truncate. Remove all the other redundant calls to ths function.
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201106035911.1942128-4-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

5b552ad7

ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTA · 174fe5ba

由 Kaixu Xia 提交于 10月 29, 2020

The macro MOPT_Q is used to indicates the mount option is related to
quota stuff and is defined to be MOPT_NOSUPPORT when CONFIG_QUOTA is
disabled.  Normally the quota options are handled explicitly, so it
didn't matter that the MOPT_STRING flag was missing, even though the
usrjquota and grpjquota mount options take a string argument.  It's
important that's present in the !CONFIG_QUOTA case, since without
MOPT_STRING, the mount option matcher will match usrjquota= followed
by an integer, and will otherwise skip the table entry, and so "mount
option not supported" error message is never reported.

[ Fixed up the commit description to better explain why the fix
  works. --TYT ]

Fixes: 26092bf5 ("ext4: use a table-driven handler for mount options")
Signed-off-by: NKaixu Xia <kaixuxia@tencent.com>
Link: https://lore.kernel.org/r/1603986396-28917-1-git-send-email-kaixuxia@tencent.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

174fe5ba

29 10月, 2020 2 次提交

ext4: use generic casefolding support · f8f4acb6

由 Daniel Rosenberg 提交于 10月 28, 2020

This switches ext4 over to the generic support provided in libfs.

Since casefolded dentries behave the same in ext4 and f2fs, we decrease
the maintenance burden by unifying them, and any optimizations will
immediately apply to both.
Signed-off-by: NDaniel Rosenberg <drosen@google.com>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20201028050820.1636571-1-drosen@google.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

f8f4acb6

ext4: use s_mount_flags instead of s_mount_state for fast commit state · ababea77

由 Harshad Shirwadkar 提交于 10月 26, 2020

Ext4's fast commit related transient states should use
sb->s_mount_flags instead of persistent sb->s_mount_state.

Fixes: 8016e29f ("ext4: fast commit recovery path")
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201027044915.2553163-3-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

ababea77

22 10月, 2020 5 次提交

ext4: add a mount opt to forcefully turn fast commits on · 0f0672ff

由 Harshad Shirwadkar 提交于 10月 15, 2020

This is a debug only mount option that forcefully turns fast commits
on at mount time.
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-9-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

0f0672ff

ext4: fast commit recovery path · 8016e29f

由 Harshad Shirwadkar 提交于 10月 15, 2020

This patch adds fast commit recovery path support for Ext4 file
system. We add several helper functions that are similar in spirit to
e2fsprogs journal recovery path handlers. Example of such functions
include - a simple block allocator, idempotent block bitmap update
function etc. Using these routines and the fast commit log in the fast
commit area, the recovery path (ext4_fc_replay()) performs fast commit
log recovery.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-8-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

8016e29f

ext4: main fast-commit commit path · aa75f4d3

由 Harshad Shirwadkar 提交于 10月 15, 2020

This patch adds main fast commit commit path handlers. The overall
patch can be divided into two inter-related parts:

(A) Metadata updates tracking

    This part consists of helper functions to track changes that need
    to be committed during a commit operation. These updates are
    maintained by Ext4 in different in-memory queues. Following are
    the APIs and their short description that are implemented in this
    patch:

    - ext4_fc_track_link/unlink/creat() - Track unlink. link and creat
      operations
    - ext4_fc_track_range() - Track changed logical block offsets
      inodes
    - ext4_fc_track_inode() - Track inodes
    - ext4_fc_mark_ineligible() - Mark file system fast commit
      ineligible()
    - ext4_fc_start_update() / ext4_fc_stop_update() /
      ext4_fc_start_ineligible() / ext4_fc_stop_ineligible() These
      functions are useful for co-ordinating inode updates with
      commits.

(B) Main commit Path

    This part consists of functions to convert updates tracked in
    in-memory data structures into on-disk commits. Function
    ext4_fc_commit() is the main entry point to commit path.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-6-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

aa75f4d3

ext4 / jbd2: add fast commit initialization · 6866d7b3

由 Harshad Shirwadkar 提交于 10月 15, 2020

This patch adds fast commit area trackers in the journal_t
structure. These are initialized via the jbd2_fc_init() routine that
this patch adds. This patch also adds ext4/fast_commit.c and
ext4/fast_commit.h files for fast commit code that will be added in
subsequent patches in this series.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-4-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

6866d7b3

ext4: add fast_commit feature and handling for extended mount options · 995a3ed6

由 Harshad Shirwadkar 提交于 10月 15, 2020

We are running out of mount option bits. Add handling for using
s_mount_opt2. Add ext4 and jbd2 fast commit feature flag and also add
ability to turn off the fast commit feature in Ext4.
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201015203802.3597742-3-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

995a3ed6

18 10月, 2020 13 次提交

ext4: Detect already used quota file early · e0770e91

由 Jan Kara 提交于 10月 15, 2020

When we try to use file already used as a quota file again (for the same
or different quota type), strange things can happen. At the very least
lockdep annotations may be wrong but also inode flags may be wrongly set
/ reset. When the file is used for two quota types at once we can even
corrupt the file and likely crash the kernel. Catch all these cases by
checking whether passed file is already used as quota file and bail
early in that case.

This fixes occasional generic/219 failure due to lockdep complaint.
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reported-by: NRitesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201015110330.28716-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

e0770e91

ext4: data=journal: write-protect pages on j_submit_inode_data_buffers() · afb585a9

由 Mauricio Faria de Oliveira 提交于 10月 05, 2020

This implements journal callbacks j_submit|finish_inode_data_buffers()
with different behavior for data=journal: to write-protect pages under
commit, preventing changes to buffers writeably mapped to userspace.

If a buffer's content changes between commit's checksum calculation
and write-out to disk, it can cause journal recovery/mount failures
upon a kernel crash or power loss.

    [   27.334874] EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, and O_DIRECT support!
    [   27.339492] JBD2: Invalid checksum recovering data block 8705 in log
    [   27.342716] JBD2: recovery failed
    [   27.343316] EXT4-fs (loop0): error loading journal
    mount: /ext4: can't read superblock on /dev/loop0.

In j_submit_inode_data_buffers() we write-protect the inode's pages
with write_cache_pages() and redirty w/ writepage callback if needed.

In j_finish_inode_data_buffers() there is nothing do to.

And in order to use the callbacks, inodes are added to the inode list
in transaction in __ext4_journalled_writepage() and ext4_page_mkwrite().

In ext4_page_mkwrite() we must make sure that the buffers are attached
to the transaction as jbddirty with write_end_fn(), as already done in
__ext4_journalled_writepage().
Signed-off-by: NMauricio Faria de Oliveira <mfo@canonical.com>
Reported-by: NDann Frazier <dann.frazier@canonical.com>
Reported-by: kernel test robot <lkp@intel.com> # wbc.nr_to_write
Suggested-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201006004841.600488-5-mfo@canonical.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

afb585a9

jbd2, ext4, ocfs2: introduce/use journal callbacks j_submit|finish_inode_data_buffers() · 342af94e

由 Mauricio Faria de Oliveira 提交于 10月 05, 2020

Introduce journal callbacks to allow different behaviors
for an inode in journal_submit|finish_inode_data_buffers().

The existing users of the current behavior (ext4, ocfs2)
are adapted to use the previously exported functions
that implement the current behavior.

Users are callers of jbd2_journal_inode_ranged_write|wait(),
which adds the inode to the transaction's inode list with
the JI_WRITE|WAIT_DATA flags. Only ext4 and ocfs2 in-tree.

Both CONFIG_EXT4_FS and CONFIG_OCSFS2_FS select CONFIG_JBD2,
which builds fs/jbd2/commit.c and journal.c that define and
export the functions, so we can call directly in ext4/ocfs2.
Signed-off-by: NMauricio Faria de Oliveira <mfo@canonical.com>
Suggested-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20201006004841.600488-3-mfo@canonical.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

342af94e

ext4: introduce ext4_sb_bread_unmovable() to replace sb_bread_unmovable() · 8394a6ab

由 zhangyi (F) 提交于 9月 24, 2020

Now we only use sb_bread_unmovable() to read superblock and descriptor
block at mount time, so there is no opportunity that we need to clear
buffer verified bit and also handle buffer write_io error bit. But for
the sake of unification, let's introduce ext4_sb_bread_unmovable() to
replace all sb_bread_unmovable(). After this patch, we stop using read
helpers in fs/buffer.c.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20200924073337.861472-8-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

8394a6ab

ext4: introduce ext4_sb_breadahead_unmovable() to replace sb_breadahead_unmovable() · 5df1d412

由 zhangyi (F) 提交于 9月 24, 2020

If we readahead inode tables in __ext4_get_inode_loc(), it may bypass
buffer_write_io_error() check, so introduce ext4_sb_breadahead_unmovable()
to handle this special case.

This patch also replace sb_breadahead_unmovable() in ext4_fill_super()
for the sake of unification.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20200924073337.861472-6-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

5df1d412

ext4: use common helpers in all places reading metadata buffers · 2d069c08

由 zhangyi (F) 提交于 9月 24, 2020

Revome all open codes that read metadata buffers, switch to use
ext4_read_bh_*() common helpers.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Suggested-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200924073337.861472-4-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

2d069c08

ext4: introduce new metadata buffer read helpers · fa491b14

由 zhangyi (F) 提交于 9月 24, 2020

The previous patch add clear_buffer_verified() before we read metadata
block from disk again, but it's rather easy to miss clearing of this bit
because currently we read metadata buffer through different open codes
(e.g. ll_rw_block(), bh_submit_read() and invoke submit_bh() directly).
So, it's time to add common helpers to unify in all the places reading
metadata buffers instead. This patch add 3 helpers:

 - ext4_read_bh_nowait(): async read metadata buffer if it's actually
   not uptodate, clear buffer_verified bit before read from disk.
 - ext4_read_bh(): sync version of read metadata buffer, it will wait
   until the read operation return and check the return status.
 - ext4_read_bh_lock(): try to lock the buffer before read buffer, it
   will skip reading if the buffer is already locked.

After this patch, we need to use these helpers in all the places reading
metadata buffer instead of different open codes.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Suggested-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200924073337.861472-3-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

fa491b14

ext4: clear buffer verified flag if read meta block from disk · d9befeda

由 zhangyi (F) 提交于 9月 24, 2020

The metadata buffer is no longer trusted after we read it from disk
again because it is not uptodate for some reasons (e.g. failed to write
back). Otherwise we may get below memory corruption problem in
ext4_ext_split()->memset() if we read stale data from the newly
allocated extent block on disk which has been failed to async write
out but miss verify again since the verified bit has already been set
on the buffer.

[   29.774674] BUG: unable to handle kernel paging request at ffff88841949d000
...
[   29.783317] Oops: 0002 [#2] SMP
[   29.784219] R10: 00000000000f4240 R11: 0000000000002e28 R12: ffff88842fa1c800
[   29.784627] CPU: 1 PID: 126 Comm: kworker/u4:3 Tainted: G      D W
[   29.785546] R13: ffffffff9cddcc20 R14: ffffffff9cddd420 R15: ffff88842fa1c2f8
[   29.786679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS ?-20190727_0738364
[   29.787588] FS:  0000000000000000(0000) GS:ffff88842fa00000(0000) knlGS:0000000000000000
[   29.789288] Workqueue: writeback wb_workfn
[   29.790319] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   29.790321]  (flush-8:0)
[   29.790844] CR2: 0000000000000008 CR3: 00000004234f2000 CR4: 00000000000006f0
[   29.791924] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   29.792839] RIP: 0010:__memset+0x24/0x30
[   29.793739] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   29.794256] Code: 90 90 90 90 90 90 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 033
[   29.795161] Kernel panic - not syncing: Fatal exception in interrupt
...
[   29.808149] Call Trace:
[   29.808475]  ext4_ext_insert_extent+0x102e/0x1be0
[   29.809085]  ext4_ext_map_blocks+0xa89/0x1bb0
[   29.809652]  ext4_map_blocks+0x290/0x8a0
[   29.809085]  ext4_ext_map_blocks+0xa89/0x1bb0
[   29.809652]  ext4_map_blocks+0x290/0x8a0
[   29.810161]  ext4_writepages+0xc85/0x17c0
...

Fix this by clearing buffer's verified bit if we read meta block from
disk again.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200924073337.861472-2-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

d9befeda

ext4: fix bdev write error check failed when mount fs with ro · 9704a322

由 Zhang Xiaoxu 提交于 9月 27, 2020

Consider a situation when a filesystem was uncleanly shutdown and the
orphan list is not empty and a read-only mount is attempted. The orphan
list cleanup during mount will fail with:

ext4_check_bdev_write_error:193: comm mount: Error while async write back metadata

This happens because sbi->s_bdev_wb_err is not initialized when mounting
the filesystem in read only mode and so ext4_check_bdev_write_error()
falsely triggers.

Initialize sbi->s_bdev_wb_err unconditionally to avoid this problem.

Fixes: bc71726c ("ext4: abort the filesystem if failed to async write metadata buffer")
Cc: stable@kernel.org
Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200928020556.710971-1-zhangxiaoxu5@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

9704a322

ext4: rename system_blks to s_system_blks inside ext4_sb_info · dd0db94f

由 Chunguang Xu 提交于 9月 24, 2020

Rename system_blks to s_system_blks inside ext4_sb_info, keep
the naming rules consistent with other variables, which is
convenient for code reading and writing.
Signed-off-by: NChunguang Xu <brookxu@tencent.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/1600916623-544-2-git-send-email-brookxu@tencent.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

dd0db94f

ext4: rename journal_dev to s_journal_dev inside ext4_sb_info · ee7ed3aa

由 Chunguang Xu 提交于 9月 24, 2020

Rename journal_dev to s_journal_dev inside ext4_sb_info, keep
the naming rules consistent with other variables, which is
convenient for code reading and writing.
Signed-off-by: NChunguang Xu <brookxu@tencent.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/1600916623-544-1-git-send-email-brookxu@tencent.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

ee7ed3aa

ext4: fix superblock checksum calculation race · acaa5326

由 Constantine Sapuntzakis 提交于 9月 14, 2020

The race condition could cause the persisted superblock checksum
to not match the contents of the superblock, causing the
superblock to be considered corrupt.

An example of the race follows.  A first thread is interrupted in the
middle of a checksum calculation. Then, another thread changes the
superblock, calculates a new checksum, and sets it. Then, the first
thread resumes and sets the checksum based on the older superblock.

To fix, serialize the superblock checksum calculation using the buffer
header lock. While a spinlock is sufficient, the buffer header is
already there and there is precedent for locking it (e.g. in
ext4_commit_super).

Tested the patch by booting up a kernel with the patch, creating
a filesystem and some files (including some orphans), and then
unmounting and remounting the file system.

Cc: stable@kernel.org
Signed-off-by: NConstantine Sapuntzakis <costa@purestorage.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Suggested-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200914161014.22275-1-costa@purestorage.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

acaa5326

ext4: fix leaking sysfs kobject after failed mount · cb8d53d2

由 Eric Biggers 提交于 9月 22, 2020

ext4_unregister_sysfs() only deletes the kobject.  The reference to it
needs to be put separately, like ext4_put_super() does.

This addresses the syzbot report
"memory leak in kobject_set_name_vargs (3)"
(https://syzkaller.appspot.com/bug?extid=9f864abad79fae7c17e1).

Reported-by: syzbot+9f864abad79fae7c17e1@syzkaller.appspotmail.com
Fixes: 72ba7450 ("ext4: release sysfs kobject when failing to enable quotas on mount")
Cc: stable@vger.kernel.org
Signed-off-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20200922162456.93657-1-ebiggers@kernel.orgReviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

cb8d53d2

22 9月, 2020 2 次提交

fscrypt: make fscrypt_set_test_dummy_encryption() take a 'const char *' · c8c868ab

由 Eric Biggers 提交于 9月 16, 2020

fscrypt_set_test_dummy_encryption() requires that the optional argument
to the test_dummy_encryption mount option be specified as a substring_t.
That doesn't work well with filesystems that use the new mount API,
since the new way of parsing mount options doesn't use substring_t.

Make it take the argument as a 'const char *' instead.

Instead of moving the match_strdup() into the callers in ext4 and f2fs,
make them just use arg->from directly. Since the pattern is
"test_dummy_encryption=%s", the argument will be null-terminated.
Acked-by: NJeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-14-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>

c8c868ab

fscrypt: handle test_dummy_encryption in more logical way · ac4acb1f

由 Eric Biggers 提交于 9月 16, 2020

The behavior of the test_dummy_encryption mount option is that when a
new file (or directory or symlink) is created in an unencrypted
directory, it's automatically encrypted using a dummy encryption policy.
That's it; in particular, the encryption (or lack thereof) of existing
files (or directories or symlinks) doesn't change.

Unfortunately the implementation of test_dummy_encryption is a bit weird
and confusing.  When test_dummy_encryption is enabled and a file is
being created in an unencrypted directory, we set up an encryption key
(->i_crypt_info) for the directory.  This isn't actually used to do any
encryption, however, since the directory is still unencrypted!  Instead,
->i_crypt_info is only used for inheriting the encryption policy.

One consequence of this is that the filesystem ends up providing a
"dummy context" (policy + nonce) instead of a "dummy policy".  In
commit ed318a6c ("fscrypt: support test_dummy_encryption=v2"), I
mistakenly thought this was required.  However, actually the nonce only
ends up being used to derive a key that is never used.

Another consequence of this implementation is that it allows for
'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
case that can be forgotten about.  For example, currently
FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
dummy encryption policy when the filesystem is mounted with
test_dummy_encryption.  That seems like the wrong thing to do, since
again, the directory itself is not actually encrypted.

Therefore, switch to a more logical and maintainable implementation
where the dummy encryption policy inheritance is done without setting up
keys for unencrypted directories.  This involves:

- Adding a function fscrypt_policy_to_inherit() which returns the
  encryption policy to inherit from a directory.  This can be a real
  policy, a dummy policy, or no policy.

- Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
  with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.

- Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
  of an inode.
Acked-by: NJaegeuk Kim <jaegeuk@kernel.org>
Acked-by: NJeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>

ac4acb1f

19 9月, 2020 1 次提交

[PATCH] reduce boilerplate in fsid handling · 6d1349c7

由 Al Viro 提交于 9月 18, 2020

Get rid of boilerplate in most of ->statfs()
instances...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6d1349c7

20 8月, 2020 1 次提交

ext4: limit the length of per-inode prealloc list · 27bc446e

由 brookxu 提交于 8月 17, 2020

In the scenario of writing sparse files, the per-inode prealloc list may
be very long, resulting in high overhead for ext4_mb_use_preallocated().
To circumvent this problem, we limit the maximum length of per-inode
prealloc list to 512 and allow users to modify it.

After patching, we observed that the sys ratio of cpu has dropped, and
the system throughput has increased significantly. We created a process
to write the sparse file, and the running time of the process on the
fixed kernel was significantly reduced, as follows:

Running time on unfixed kernel：
[root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
real    0m2.051s
user    0m0.008s
sys     0m2.026s

Running time on fixed kernel：
[root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
real    0m0.471s
user    0m0.004s
sys     0m0.395s
Signed-off-by: NChunguang Xu <brookxu@tencent.com>
Link: https://lore.kernel.org/r/d7a98178-056b-6db5-6bce-4ead23f4a257@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

27bc446e

08 8月, 2020 7 次提交

fs: prevent BUG_ON in submit_bh_wbc() · 377254b2

由 Xianting Tian 提交于 7月 31, 2020

If a device is hot-removed --- for example, when a physical device is
unplugged from pcie slot or a nbd device's network is shutdown ---
this can result in a BUG_ON() crash in submit_bh_wbc().  This is
because the when the block device dies, the buffer heads will have
their Buffer_Mapped flag get cleared, leading to the crash in
submit_bh_wbc.

We had attempted to work around this problem in commit a17712c8
("ext4: check superblock mapped prior to committing").  Unfortunately,
it's still possible to hit the BUG_ON(!buffer_mapped(bh)) if the
device dies between when the work-around check in ext4_commit_super()
and when submit_bh_wbh() is finally called:

Code path:
ext4_commit_super
    judge if 'buffer_mapped(sbh)' is false, return <== commit a17712c8
          lock_buffer(sbh)
          ...
          unlock_buffer(sbh)
               __sync_dirty_buffer(sbh,...
                    lock_buffer(sbh)
                        judge if 'buffer_mapped(sbh))' is false, return <== added by this patch
                            submit_bh(...,sbh)
                                submit_bh_wbc(...,sbh,...)

[100722.966497] kernel BUG at fs/buffer.c:3095! <== BUG_ON(!buffer_mapped(bh))' in submit_bh_wbc()
[100722.966503] invalid opcode: 0000 [#1] SMP
[100722.966566] task: ffff8817e15a9e40 task.stack: ffffc90024744000
[100722.966574] RIP: 0010:submit_bh_wbc+0x180/0x190
[100722.966575] RSP: 0018:ffffc90024747a90 EFLAGS: 00010246
[100722.966576] RAX: 0000000000620005 RBX: ffff8818a80603a8 RCX: 0000000000000000
[100722.966576] RDX: ffff8818a80603a8 RSI: 0000000000020800 RDI: 0000000000000001
[100722.966577] RBP: ffffc90024747ac0 R08: 0000000000000000 R09: ffff88207f94170d
[100722.966578] R10: 00000000000437c8 R11: 0000000000000001 R12: 0000000000020800
[100722.966578] R13: 0000000000000001 R14: 000000000bf9a438 R15: ffff88195f333000
[100722.966580] FS:  00007fa2eee27700(0000) GS:ffff88203d840000(0000) knlGS:0000000000000000
[100722.966580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[100722.966581] CR2: 0000000000f0b008 CR3: 000000201a622003 CR4: 00000000007606e0
[100722.966582] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[100722.966583] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[100722.966583] PKRU: 55555554
[100722.966583] Call Trace:
[100722.966588]  __sync_dirty_buffer+0x6e/0xd0
[100722.966614]  ext4_commit_super+0x1d8/0x290 [ext4]
[100722.966626]  __ext4_std_error+0x78/0x100 [ext4]
[100722.966635]  ? __ext4_journal_get_write_access+0xca/0x120 [ext4]
[100722.966646]  ext4_reserve_inode_write+0x58/0xb0 [ext4]
[100722.966655]  ? ext4_dirty_inode+0x48/0x70 [ext4]
[100722.966663]  ext4_mark_inode_dirty+0x53/0x1e0 [ext4]
[100722.966671]  ? __ext4_journal_start_sb+0x6d/0xf0 [ext4]
[100722.966679]  ext4_dirty_inode+0x48/0x70 [ext4]
[100722.966682]  __mark_inode_dirty+0x17f/0x350
[100722.966686]  generic_update_time+0x87/0xd0
[100722.966687]  touch_atime+0xa9/0xd0
[100722.966690]  generic_file_read_iter+0xa09/0xcd0
[100722.966694]  ? page_cache_tree_insert+0xb0/0xb0
[100722.966704]  ext4_file_read_iter+0x4a/0x100 [ext4]
[100722.966707]  ? __inode_security_revalidate+0x4f/0x60
[100722.966709]  __vfs_read+0xec/0x160
[100722.966711]  vfs_read+0x8c/0x130
[100722.966712]  SyS_pread64+0x87/0xb0
[100722.966716]  do_syscall_64+0x67/0x1b0
[100722.966719]  entry_SYSCALL64_slow_path+0x25/0x25

To address this, add the check of 'buffer_mapped(bh)' to
__sync_dirty_buffer().  This also has the benefit of fixing this for
other file systems.

With this addition, we can drop the workaround in ext4_commit_supper().

[ Commit description rewritten by tytso. ]
Signed-off-by: NXianting Tian <xianting_tian@126.com>
Link: https://lore.kernel.org/r/1596211825-8750-1-git-send-email-xianting_tian@126.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

377254b2

ext4: correctly restore system zone info when remount fails · 0f5bde1d

由 Jan Kara 提交于 7月 28, 2020

When remounting filesystem fails late during remount handling and
block_validity mount option is also changed during the remount, we fail
to restore system zone information to a state matching the mount option.
This is mostly harmless, just the block validity checking will not match
the situation described by the mount option. Make sure these two are always
consistent.
Reported-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200728130437.7804-7-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

0f5bde1d

ext4: handle error of ext4_setup_system_zone() on remount · d176b1f6

由 Jan Kara 提交于 7月 28, 2020

ext4_setup_system_zone() can fail. Handle the failure in ext4_remount().
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200728130437.7804-2-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

d176b1f6

ext4: export msg_count and warning_count via sysfs · 1cf006ed

由 Dmitry Monakhov 提交于 7月 25, 2020

This numbers can be analized by system automation similar to errors_count.
In ideal world it would be nice to have separate counters for different
log-levels, but this makes this patch too intrusive.
Signed-off-by: NDmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Link: https://lore.kernel.org/r/20200725123313.4467-1-dmtrmonakhov@yandex-team.ruSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

1cf006ed

ext4: handle option set by mount flags correctly · f25391eb

由 Lukas Czerner 提交于 7月 23, 2020

Currently there is a problem with mount options that can be both set by
vfs using mount flags or by a string parsing in ext4.

i_version/iversion options gets lost after remount, for example

$ mount -o i_version /dev/pmem0 /mnt
$ grep pmem0 /proc/self/mountinfo | grep i_version
310 95 259:0 / /mnt rw,relatime shared:163 - ext4 /dev/pmem0 rw,seclabel,i_version
$ mount -o remount,ro /mnt
$ grep pmem0 /proc/self/mountinfo | grep i_version

nolazytime gets ignored by ext4 on remount, for example

$ mount -o lazytime /dev/pmem0 /mnt
$ grep pmem0 /proc/self/mountinfo | grep lazytime
310 95 259:0 / /mnt rw,relatime shared:163 - ext4 /dev/pmem0 rw,lazytime,seclabel
$ mount -o remount,nolazytime /mnt
$ grep pmem0 /proc/self/mountinfo | grep lazytime
310 95 259:0 / /mnt rw,relatime shared:163 - ext4 /dev/pmem0 rw,lazytime,seclabel

Fix it by applying the SB_LAZYTIME and SB_I_VERSION flags from *flags to
s_flags before we parse the option and use the resulting state of the
same flags in *flags at the end of successful remount.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200723150526.19931-1-lczerner@redhat.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

f25391eb

ext4: add prefetch_block_bitmaps mount option · 3d392b26

由 Theodore Ts'o 提交于 7月 17, 2020

For file systems where we can afford to keep the buddy bitmaps cached,
we can speed up initial writes to large file systems by starting to
load the block allocation bitmaps as soon as the file system is
mounted.  This won't work well for _super_ large file systems, or
memory constrained systems, so we only enable this when it is
requested via a mount option.

Addresses-Google-Bug: 159488342
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>

3d392b26

jbd2: remove unused parameter in jbd2_journal_try_to_free_buffers() · 529a781e

由 zhangyi (F) 提交于 6月 20, 2020

Parameter gfp_mask in jbd2_journal_try_to_free_buffers() is no longer
used after commit <536fc240> ("jbd2: clean up
jbd2_journal_try_to_free_buffers()"), so just remove it.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20200620025427.1756360-6-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

529a781e