提交 · ab74c7b23f3770935016e3eb3ecdf1e42b73efaa · openeuler / Kernel

08 8月, 2020 1 次提交

ext4: abort the filesystem if failed to async write metadata buffer · bc71726c

由 zhangyi (F) 提交于 6月 20, 2020

There is a risk of filesystem inconsistency if we failed to async write
back metadata buffer in the background. Because of current buffer's end
io procedure is handled by end_buffer_async_write() in the block layer,
and it only clear the buffer's uptodate flag and mark the write_io_error
flag, so ext4 cannot detect such failure immediately. In most cases of
getting metadata buffer (e.g. ext4_read_inode_bitmap()), although the
buffer's data is actually uptodate, it may still read data from disk
because the buffer's uptodate flag has been cleared. Finally, it may
lead to on-disk filesystem inconsistency if reading old data from the
disk successfully and write them out again.

This patch detect bdev mapping->wb_err when getting journal's write
access and mark the filesystem error if bdev's mapping->wb_err was
increased, this could prevent further writing and potential
inconsistency.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Suggested-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200620025427.1756360-2-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

bc71726c

06 8月, 2020 3 次提交

ext4: add prefetching for block allocation bitmaps · cfd73237

由 Alex Zhuravlev 提交于 4月 21, 2020

This should significantly improve bitmap loading, especially for flex
groups as it tries to load all bitmaps within a flex.group instead of
one by one synchronously.

Prefetching is done in 8 * flex_bg groups, so it should be 8
read-ahead reads for a single allocating thread. At the end of
allocation the thread waits for read-ahead completion and initializes
buddy information so that read-aheads are not lost in case of memory
pressure.

At cr=0 the number of prefetching IOs is limited per allocation
context to prevent a situation when mballoc loads thousands of bitmaps
looking for a perfect group and ignoring groups with good chunks.

Together with the patch "ext4: limit scanning of uninitialized groups"
the mount time (which includes few tiny allocations) of a 1PB
filesystem is reduced significantly:

               0% full    50%-full unpatched    patched
  mount time       33s                9279s       563s

[ Restructured by tytso; removed the state flags in the allocation
  context, so it can be used to lazily prefetch the allocation bitmaps
  immediately after the file system is mounted.  Skip prefetching
  block groups which are uninitialized.  Finally pass in the
  REQ_RAHEAD flag to the block layer while prefetching. ]
Signed-off-by: NAlex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: NAndreas Dilger <adilger@whamcloud.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

cfd73237

ext4: use generic names for generic ioctls · cb29a02d

由 Eric Biggers 提交于 7月 14, 2020

Don't define EXT4_IOC_* aliases to ioctls that already have a generic
FS_IOC_* name. These aliases are unnecessary, and they make it unclear
which ioctls are ext4-specific and which are generic.

Exception: leave EXT4_IOC_GETVERSION_OLD and EXT4_IOC_SETVERSION_OLD
as-is for now, since renaming them to FS_IOC_GETVERSION and
FS_IOC_SETVERSION would probably make them more likely to be confused
with EXT4_IOC_GETVERSION and EXT4_IOC_SETVERSION which also exist.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200714230909.56349-1-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

cb29a02d

ext4: don't hardcode bit values in EXT4_FL_USER_* · 2a12e147

由 Eric Biggers 提交于 7月 12, 2020

Define the EXT4_FL_USER_* constants by OR-ing together the appropriate
flags, rather than hard-coding a numeric value.  This makes it much
easier to see which flags are listed.

No change in the actual values.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200713031012.192440-1-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

2a12e147

04 6月, 2020 10 次提交

fs: move the fiemap definitions out of fs.h · 10c5db28

由 Christoph Hellwig 提交于 5月 23, 2020

No need to pull the fiemap definitions into almost every file in the
kernel build.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

10c5db28

ext4: fix EXT4_MAX_LOGICAL_BLOCK macro · 175efa81

由 Ritesh Harjani 提交于 5月 05, 2020

ext4 supports max number of logical blocks in a file to be 0xffffffff.
(This is since ext4_extent's ee_block is __le32).
This means that EXT4_MAX_LOGICAL_BLOCK should be 0xfffffffe (starting
from 0 logical offset). This patch fixes this.

The issue was seen when ext4 moved to iomap_fiemap API and when
overlayfs was mounted on top of ext4. Since overlayfs was missing
filemap_check_ranges(), so it could pass a arbitrary huge length which
lead to overflow of map.m_len logic.

This patch fixes that.

Fixes: d3b6f23f ("ext4: move ext4_fiemap to use iomap framework")
Reported-by: syzbot+77fa5bdb65cc39711820@syzkaller.appspotmail.com
Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20200505154324.3226743-2-hch@lst.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

175efa81

add comment for ext4_dir_entry_2 file_type member · 9f364e1d

由 Jonathan Grant 提交于 5月 22, 2020

Signed-off-by: NJonathan Grant <jg@jguk.org>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/ad3290d5-86af-99c1-f9d5-cd1bab710429@jguk.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

9f364e1d

ext4: mballoc: use lock for checking free blocks while retrying · 99377830

由 Ritesh Harjani 提交于 5月 20, 2020

Currently while doing block allocation grp->bb_free may be getting
modified if discard is happening in parallel.
For e.g. consider a case where there are lot of threads who have
preallocated lot of blocks and there is a thread which is trying
to discard all of this group's PA. Now it could happen that
we see all of those group's bb_free is zero and fail the allocation
while there is sufficient space if we free up all the PA.

So this patch adds another flag "EXT4_MB_STRICT_CHECK" which will be set
if we are unable to allocate any blocks in the first try (since we may
not have considered blocks about to be discarded from PA lists).
So during retry attempt to allocate blocks we will use ext4_lock_group()
for checking if the group is good or not.
Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/9cb740a117c958c36596f167b12af1beae9a68b7.1589955723.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

99377830

ext4: add casefold flag to EXT4_INODE_* flags · de8ff14c

由 Eric Biggers 提交于 5月 10, 2020

No one currently needs EXT4_INODE_CASEFOLD, but add it to keep the
EXT4_INODE_* definitions in sync with the EXT4_*_FL definitions.

Also make it clearer that the casefold flag is only for directories.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20200510215252.87833-1-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

de8ff14c

ext4: make ext_debug() implementation to use pr_debug() · 70aa1554

由 Ritesh Harjani 提交于 5月 10, 2020

ext_debug() msgs could be helpful, provided those could be enabled
without recompiling kernel and also if we could selectively enable
only required prints for case by case debugging.

So make ext_debug() implementation use pr_debug().
Also change ext_debug() to be defined with CONFIG_EXT4_DEBUG.
So EXT_DEBUG macro now mostly remain for below 3 functions.
ext4_ext_show_path/leaf/move() (whose print msgs use ext_debug()
which again could be dynamically enabled using pr_debug())

This also changes the ext_debug() to take inode as a parameter
to add inode no. in all of it's msgs.
Prints additional info like process name / pid, superblock id etc.
This also removes any explicit function names passed in ext_debug().
Since ext_debug() on it's own prints file, func and line no.
Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/d31dc189b0aeda9384fe7665e36da7cd8c61571f.1589086800.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

70aa1554

ext4: use BIT() macro for BH_** state bits · 6db07461

由 Ritesh Harjani 提交于 5月 10, 2020

Simply use BIT() macro for all BH_** state bits instead of open
coding it.

There should be no functionality change in this patch.
Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/57667689f51a3f9dba2fcef7d3425187fa3ba69f.1589086800.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

6db07461

ext4: avoid ext4_error()'s caused by ENOMEM in the truncate path · 73c384c0

由 Theodore Ts'o 提交于 5月 07, 2020

We can't fail in the truncate path without requiring an fsck.
Add work around for this by using a combination of retry loops
and the __GFP_NOFAIL flag.

From: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NAnna Pendleton <pendleton@google.com>
Reviewed-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20200507175028.15061-1-pendleton@google.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

73c384c0

ext4: handle ext4_mark_inode_dirty errors · 4209ae12

由 Harshad Shirwadkar 提交于 4月 26, 2020

ext4_mark_inode_dirty() can fail for real reasons. Ignoring its return
value may lead ext4 to ignore real failures that would result in
corruption / crashes. Harden ext4_mark_inode_dirty error paths to fail
as soon as possible and return errors to the caller whenever
appropriate.

One of the possible scnearios when this bug could affected is that
while creating a new inode, its directory entry gets added
successfully but while writing the inode itself mark_inode_dirty
returns error which is ignored. This would result in inconsistency
that the directory entry points to a non-existent inode.

Ran gce-xfstests smoke tests and verified that there were no
regressions.
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20200427013438.219117-1-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

4209ae12

ext4: remove EXT4_GET_BLOCKS_KEEP_SIZE flag · 9e52484c

由 Eric Whitney 提交于 4月 15, 2020

The eofblocks code was removed in the 5.7 release by "ext4: remove
EOFBLOCKS_FL and associated code" (4337ecd1). The ext4_map_blocks()
flag used to trigger it can now be removed as well.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200415203140.30349-2-enwlinux@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

9e52484c

03 6月, 2020 2 次提交

ext4: pass the inode to ext4_mpage_readpages · a07f624b

由 Matthew Wilcox (Oracle) 提交于 6月 01, 2020

This function now only uses the mapping argument to look up the inode, and
both callers already have the inode, so just pass the inode instead of the
mapping.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-22-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a07f624b

ext4: convert from readpages to readahead · 6311f91f

由 Matthew Wilcox (Oracle) 提交于 6月 01, 2020

Use the new readahead operation in ext4
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-21-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6311f91f

29 5月, 2020 5 次提交

fs/ext4: Introduce DAX inode flag · b383a73f

由 Ira Weiny 提交于 5月 28, 2020

Add a flag ([EXT4|FS]_DAX_FL) to preserve FS_XFLAG_DAX in the ext4
inode.

Set the flag to be user visible and changeable.  Set the flag to be
inherited.  Allow applications to change the flag at any time except if
it conflicts with the set of mutually exclusive flags (Currently VERITY,
ENCRYPT, JOURNAL_DATA).

Furthermore, restrict setting any of the exclusive flags if DAX is set.

While conceptually possible, we do not allow setting EXT4_DAX_FL while
at the same time clearing exclusion flags (or vice versa) for 2 reasons:

	1) The DAX flag does not take effect immediately which
	   introduces quite a bit of complexity
	2) There is no clear use case for being this flexible

Finally, on regular files, flag the inode to not be cached to facilitate
changing S_DAX on the next creation of the inode.
Signed-off-by: NIra Weiny <ira.weiny@intel.com>

Link: https://lore.kernel.org/r/20200528150003.828793-9-ira.weiny@intel.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

b383a73f

fs/ext4: Make DAX mount option a tri-state · 9cb20f94

由 Ira Weiny 提交于 5月 28, 2020

We add 'always', 'never', and 'inode' (default).  '-o dax' continues to
operate the same which is equivalent to 'always'.  This new
functionality is limited to ext4 only.

Specifically we introduce a 2nd DAX mount flag EXT4_MOUNT2_DAX_NEVER and set
it and EXT4_MOUNT_DAX_ALWAYS appropriately for the mode.

We also force EXT4_MOUNT2_DAX_NEVER if !CONFIG_FS_DAX.

Finally, EXT4_MOUNT2_DAX_INODE is used solely to detect if the user
specified that option for printing.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>

Link: https://lore.kernel.org/r/20200528150003.828793-7-ira.weiny@intel.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

9cb20f94

fs/ext4: Only change S_DAX on inode load · 043546e4

由 Ira Weiny 提交于 5月 28, 2020

To prevent complications with in memory inodes we only set S_DAX on
inode load.  FS_XFLAG_DAX can be changed at any time and S_DAX will
change after inode eviction and reload.

Add init bool to ext4_set_inode_flags() to indicate if the inode is
being newly initialized.

Assert that S_DAX is not set on an inode which is just being loaded.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>

Link: https://lore.kernel.org/r/20200528150003.828793-6-ira.weiny@intel.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

043546e4

fs/ext4: Update ext4_should_use_dax() · a8ab6d38

由 Ira Weiny 提交于 5月 28, 2020

S_DAX should only be enabled when the underlying block device supports
dax.

Cache the underlying support for DAX in the super block and modify
ext4_should_use_dax() to check for device support prior to the over
riding mount option.

While we are at it change the function to ext4_should_enable_dax() as
this better reflects the ask as well as matches xfs.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>

Link: https://lore.kernel.org/r/20200528150003.828793-5-ira.weiny@intel.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

a8ab6d38

fs/ext4: Change EXT4_MOUNT_DAX to EXT4_MOUNT_DAX_ALWAYS · fc626fe3

由 Ira Weiny 提交于 5月 28, 2020

In prep for the new tri-state mount option which then introduces
EXT4_MOUNT_DAX_NEVER.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>

Link: https://lore.kernel.org/r/20200528150003.828793-4-ira.weiny@intel.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

fc626fe3

20 5月, 2020 1 次提交

ext4: fix EXT4_MAX_LOGICAL_BLOCK macro · 9f44eda1

由 Ritesh Harjani 提交于 5月 05, 2020

ext4 supports max number of logical blocks in a file to be 0xffffffff.
(This is since ext4_extent's ee_block is __le32).
This means that EXT4_MAX_LOGICAL_BLOCK should be 0xfffffffe (starting
from 0 logical offset). This patch fixes this.

The issue was seen when ext4 moved to iomap_fiemap API and when
overlayfs was mounted on top of ext4. Since overlayfs was missing
filemap_check_ranges(), so it could pass a arbitrary huge length which
lead to overflow of map.m_len logic.

This patch fixes that.

Fixes: d3b6f23f ("ext4: move ext4_fiemap to use iomap framework")
Reported-by: syzbot+77fa5bdb65cc39711820@syzkaller.appspotmail.com
Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20200505154324.3226743-2-hch@lst.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

9f44eda1

19 5月, 2020 1 次提交

fscrypt: support test_dummy_encryption=v2 · ed318a6c

由 Eric Biggers 提交于 5月 12, 2020

v1 encryption policies are deprecated in favor of v2, and some new
features (e.g. encryption+casefolding) are only being added for v2.

Therefore, the "test_dummy_encryption" mount option (which is used for
encryption I/O testing with xfstests) needs to support v2 policies.

To do this, extend its syntax to be "test_dummy_encryption=v1" or
"test_dummy_encryption=v2". The existing "test_dummy_encryption" (no
argument) also continues to be accepted, to specify the default setting
-- currently v1, but the next patch changes it to v2.

To cleanly support both v1 and v2 while also making it easy to support
specifying other encryption settings in the future (say, accepting
"$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a
pointer to the dummy fscrypt_context rather than using mount flags.

To avoid concurrency issues, don't allow test_dummy_encryption to be set
or changed during a remount. (The former restriction is new, but
xfstests doesn't run into it, so no one should notice.)

Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'. On ext4,
there are two regressions, both of which are test bugs: ext4/023 and
ext4/028 fail because they set an xattr and expect it to be stored
inline, but the increase in size of the fscrypt_context from
24 to 40 bytes causes this xattr to be spilled into an external block.

Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.orgAcked-by: NJaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NEric Biggers <ebiggers@google.com>

ed318a6c

02 4月, 2020 1 次提交

ext4: save all error info in save_error_info() and drop ext4_set_errno() · 54d3adbc

由 Theodore Ts'o 提交于 3月 28, 2020

Using a separate function, ext4_set_errno() to set the errno is
problematic because it doesn't do the right thing once
s_last_error_errorcode is non-zero.  It's also less racy to set all of
the error information all at once.  (Also, as a bonus, it shrinks code
size slightly.)

Link: https://lore.kernel.org/r/20200329020404.686965-1-tytso@mit.edu
Fixes: 878520ac ("ext4: save the error code which triggered...")
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

54d3adbc

06 3月, 2020 1 次提交

ext4: remove EXT4_EOFBLOCKS_FL and associated code · 4337ecd1

由 Eric Whitney 提交于 2月 11, 2020

The EXT4_EOFBLOCKS_FL inode flag is used to indicate whether a file
contains unwritten blocks past i_size. It's set when ext4_fallocate
is called with the KEEP_SIZE flag to extend a file with an unwritten
extent. However, this flag hasn't been useful functionally since
March, 2012, when a decision was made to remove it from ext4.

All traces of EXT4_EOFBLOCKS_FL were removed from e2fsprogs version
1.42.2 by commit 010dc7b90d97 ("e2fsck: remove EXT4_EOFBLOCKS_FL flag
handling") at that time. Now that enough time has passed to make
e2fsprogs versions containing this modification common, this patch now
removes the code associated with EXT4_EOFBLOCKS_FL from the kernel as
well.

This change has two implications. First, because pre-1.42.2 e2fsck
versions only look for a problem if EXT4_EOFBLOCKS_FL is set, and
because that bit will never be set by newer kernels containing this
patch, old versions of e2fsck won't have a compatibility problem with
files created by newer kernels.

Second, newer kernels will not clear EXT4_EOFBLOCKS_FL inode flag bits
belonging to a file written by an older kernel. If set, it will remain
in that state until the file is deleted. Because e2fsck versions since
1.42.2 don't check the flag at all, no adverse effect is expected.
However, pre-1.42.2 e2fsck versions that do check the flag may report
that it is set when it ought not to be after a file has been truncated
or had its unwritten blocks written. In this case, the old version of
e2fsck will offer to clear the flag. No adverse effect would then
occur whether the user chooses to clear the flag or not.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20200211210216.24960-1-enwlinux@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

4337ecd1

22 2月, 2020 3 次提交

ext4: fix race between writepages and enabling EXT4_EXTENTS_FL · cb85f4d2

由 Eric Biggers 提交于 2月 19, 2020

If EXT4_EXTENTS_FL is set on an inode while ext4_writepages() is running
on it, the following warning in ext4_add_complete_io() can be hit:

WARNING: CPU: 1 PID: 0 at fs/ext4/page-io.c:234 ext4_put_io_end_defer+0xf0/0x120

Here's a minimal reproducer (not 100% reliable) (root isn't required):

        while true; do
                sync
        done &
        while true; do
                rm -f file
                touch file
                chattr -e file
                echo X >> file
                chattr +e file
        done

The problem is that in ext4_writepages(), ext4_should_dioread_nolock()
(which only returns true on extent-based files) is checked once to set
the number of reserved journal credits, and also again later to select
the flags for ext4_map_blocks() and copy the reserved journal handle to
ext4_io_end::handle.  But if EXT4_EXTENTS_FL is being concurrently set,
the first check can see dioread_nolock disabled while the later one can
see it enabled, causing the reserved handle to unexpectedly be NULL.

Since changing EXT4_EXTENTS_FL is uncommon, and there may be other races
related to doing so as well, fix this by synchronizing changing
EXT4_EXTENTS_FL with ext4_writepages() via the existing
s_writepages_rwsem (previously called s_journal_flag_rwsem).

This was originally reported by syzbot without a reproducer at
https://syzkaller.appspot.com/bug?extid=2202a584a00fffd19fbf,
but now that dioread_nolock is the default I also started seeing this
when running syzkaller locally.

Link: https://lore.kernel.org/r/20200219183047.47417-3-ebiggers@kernel.org
Reported-by: syzbot+2202a584a00fffd19fbf@syzkaller.appspotmail.com
Fixes: 6b523df4 ("ext4: use transaction reservation for extent conversion in ext4_end_io")
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@kernel.org

cb85f4d2

ext4: rename s_journal_flag_rwsem to s_writepages_rwsem · bbd55937

由 Eric Biggers 提交于 2月 19, 2020

In preparation for making s_journal_flag_rwsem synchronize
ext4_writepages() with changes to both the EXTENTS and JOURNAL_DATA
flags (rather than just JOURNAL_DATA as it does currently), rename it to
s_writepages_rwsem.

Link: https://lore.kernel.org/r/20200219183047.47417-2-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@kernel.org

bbd55937

ext4: fix potential race between s_flex_groups online resizing and access · 7c990728

由 Suraj Jitindar Singh 提交于 2月 18, 2020

During an online resize an array of s_flex_groups structures gets replaced
so it can get enlarged. If there is a concurrent access to the array and
this memory has been reused then this can lead to an invalid memory access.

The s_flex_group array has been converted into an array of pointers rather
than an array of structures. This is to ensure that the information
contained in the structures cannot get out of sync during a resize due to
an accessor updating the value in the old structure after it has been
copied but before the array pointer is updated. Since the structures them-
selves are no longer copied but only the pointers to them this case is
mitigated.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
Link: https://lore.kernel.org/r/20200221053458.730016-4-tytso@mit.eduSigned-off-by: NSuraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

7c990728

21 2月, 2020 2 次提交

ext4: fix potential race between s_group_info online resizing and access · df3da4ea

由 Suraj Jitindar Singh 提交于 2月 18, 2020

During an online resize an array of pointers to s_group_info gets replaced
so it can get enlarged. If there is a concurrent access to the array in
ext4_get_group_info() and this memory has been reused then this can lead to
an invalid memory access.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
Link: https://lore.kernel.org/r/20200221053458.730016-3-tytso@mit.eduSigned-off-by: NSuraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NBalbir Singh <sblbir@amazon.com>
Cc: stable@kernel.org

df3da4ea

ext4: fix potential race between online resizing and write operations · 1d0c3924

由 Theodore Ts'o 提交于 2月 15, 2020

During an online resize an array of pointers to buffer heads gets
replaced so it can get enlarged.  If there is a racing block
allocation or deallocation which uses the old array, and the old array
has gotten reused this can lead to a GPF or some other random kernel
memory getting modified.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
Link: https://lore.kernel.org/r/20200221053458.730016-2-tytso@mit.eduReported-by: NSuraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

1d0c3924

20 2月, 2020 1 次提交

ext4: fix a data race in EXT4_I(inode)->i_disksize · 35df4299

由 Qian Cai 提交于 2月 07, 2020

EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by
KCSAN,

 BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4]

 write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127:
  ext4_write_end+0x4e3/0x750 [ext4]
  ext4_update_i_disksize at fs/ext4/ext4.h:3032
  (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046
  (inlined by) ext4_write_end at fs/ext4/inode.c:1287
  generic_perform_write+0x208/0x2a0
  ext4_buffered_write_iter+0x11f/0x210 [ext4]
  ext4_file_write_iter+0xce/0x9e0 [ext4]
  new_sync_write+0x29c/0x3b0
  __vfs_write+0x92/0xa0
  vfs_write+0x103/0x260
  ksys_write+0x9d/0x130
  __x64_sys_write+0x4c/0x60
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37:
  ext4_writepages+0x10ac/0x1d00 [ext4]
  mpage_map_and_submit_extent at fs/ext4/inode.c:2468
  (inlined by) ext4_writepages at fs/ext4/inode.c:2772
  do_writepages+0x5e/0x130
  __writeback_single_inode+0xeb/0xb20
  writeback_sb_inodes+0x429/0x900
  __writeback_inodes_wb+0xc4/0x150
  wb_writeback+0x4bd/0x870
  wb_workfn+0x6b4/0x960
  process_one_work+0x54c/0xbe0
  worker_thread+0x80/0x650
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G        W  O L 5.5.0-next-20200204+ #5
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
 Workqueue: writeback wb_workfn (flush-7:0)

Since only the read is operating as lockless (outside of the
"i_data_sem"), load tearing could introduce a logic bug. Fix it by
adding READ_ONCE() for the read and WRITE_ONCE() for the write.
Signed-off-by: NQian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/1581085751-31793-1-git-send-email-cai@lca.pwSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

35df4299

14 2月, 2020 1 次提交

ext4: fix checksum errors with indexed dirs · 48a34311

由 Jan Kara 提交于 2月 10, 2020

DIR_INDEX has been introduced as a compat ext4 feature. That means that
even kernels / tools that don't understand the feature may modify the
filesystem. This works because for kernels not understanding indexed dir
format, internal htree nodes appear just as empty directory entries.
Index dir aware kernels then check the htree structure is still
consistent before using the data. This all worked reasonably well until
metadata checksums were introduced. The problem is that these
effectively made DIR_INDEX only ro-compatible because internal htree
nodes store checksums in a different place than normal directory blocks.
Thus any modification ignorant to DIR_INDEX (or just clearing
EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch
and trigger kernel errors. So we have to be more careful when dealing
with indexed directories on filesystems with checksumming enabled.

1) We just disallow loading any directory inodes with EXT4_INDEX_FL when
DIR_INDEX is not enabled. This is harsh but it should be very rare (it
means someone disabled DIR_INDEX on existing filesystem and didn't run
e2fsck), e2fsck can fix the problem, and we don't want to answer the
difficult question: "Should we rather corrupt the directory more or
should we ignore that DIR_INDEX feature is not set?"

2) When we find out htree structure is corrupted (but the filesystem and
the directory should in support htrees), we continue just ignoring htree
information for reading but we refuse to add new entries to the
directory to avoid corrupting it more.

Link: https://lore.kernel.org/r/20200210144316.22081-1-jack@suse.cz
Fixes: dbe89444 ("ext4: Calculate and verify checksums for htree nodes")
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

48a34311

18 1月, 2020 4 次提交

ext4: drop ext4_kvmalloc() · 71b565ce

由 Theodore Ts'o 提交于 1月 16, 2020

As Jan pointed out[1], as of commit 81378da6 ("jbd2: mark the
transaction context with the scope GFP_NOFS context") we use
memalloc_nofs_{save,restore}() while a jbd2 handle is active.  So
ext4_kvmalloc() so we can call allocate using GFP_NOFS is no longer
necessary.

[1] https://lore.kernel.org/r/20200109100007.GC27035@quack2.suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20200116155031.266620-1-tytso@mit.eduReviewed-by: NJan Kara <jack@suse.cz>

71b565ce

ext4: make some functions static in extents.c · 43f81677

由 Eric Biggers 提交于 12月 31, 2019

Make the following functions static since they're only used in
extents.c:

	__ext4_ext_dirty()
	ext4_can_extents_be_merged()
	ext4_collapse_range()
	ext4_insert_range()

Also remove the prototype for ext4_ext_writepage_trans_blocks(), as this
function is not defined anywhere.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20191231180444.46586-5-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>

43f81677

ext4: remove ext4_{ind,ext}_calc_metadata_amount() · dd6683e6

由 Eric Biggers 提交于 12月 31, 2019

Remove the ext4_ind_calc_metadata_amount() and
ext4_ext_calc_metadata_amount() functions, which have been unused since
commit 71d4f7d0 ("ext4: remove metadata reservation checks").

Also remove the i_da_metadata_calc_last_lblock and
i_da_metadata_calc_len fields from struct ext4_inode_info, as these were
only used by these removed functions.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20191231180444.46586-2-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>

dd6683e6

ext4: Delete ext4_kvzvalloc() · 8f27fd0a

由 Naoto Kobayashi 提交于 12月 27, 2019

Since we're not using ext4_kvzalloc(), delete this function.
Signed-off-by: NNaoto Kobayashi <naoto.kobayashi4c@gmail.com>
Link: https://lore.kernel.org/r/20191227080523.31808-2-naoto.kobayashi4c@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

8f27fd0a

27 12月, 2019 3 次提交

ext4: Optimize ext4 DIO overwrites · 8cd115bd

由 Jan Kara 提交于 12月 18, 2019

Currently we start transaction for mapping every extent for writing
using direct IO. This is unnecessary when we know we are overwriting
already allocated blocks and the overhead of starting a transaction can
be significant especially for multithreaded workloads doing small writes.
Use iomap operations that avoid starting a transaction for direct IO
overwrites.

This improves throughput of 4k random writes - fio jobfile:
[global]
rw=randrw
norandommap=1
invalidate=0
bs=4k
numjobs=16
time_based=1
ramp_time=30
runtime=120
group_reporting=1
ioengine=psync
direct=1
size=16G
filename=file1.0.0:file1.0.1:file1.0.2:file1.0.3:file1.0.4:file1.0.5:file1.0.6:file1.0.7:file1.0.8:file1.0.9:file1.0.10:file1.0.11:file1.0.12:file1.0.13:file1.0.14:file1.0.15:file1.0.16:file1.0.17:file1.0.18:file1.0.19:file1.0.20:file1.0.21:file1.0.22:file1.0.23:file1.0.24:file1.0.25:file1.0.26:file1.0.27:file1.0.28:file1.0.29:file1.0.30:file1.0.31
file_service_type=random
nrfiles=32

from 3018MB/s to 4059MB/s in my test VM running test against simulated
pmem device (note that before iomap conversion, this workload was able
to achieve 3708MB/s because old direct IO path avoided transaction start
for overwrites as well). For dax, the win is even larger improving
throughput from 3042MB/s to 4311MB/s.
Reported-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191218174433.19380-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

8cd115bd

ext4: simulate various I/O and checksum errors when reading metadata · 46f870d6

由 Theodore Ts'o 提交于 11月 21, 2019

This allows us to test various error handling code paths

Link: https://lore.kernel.org/r/20191209012317.59398-1-tytso@mit.eduSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

46f870d6

ext4: save the error code which triggered an ext4_error() in the superblock · 878520ac

由 Theodore Ts'o 提交于 11月 19, 2019

This allows the cause of an ext4_error() report to be categorized
based on whether it was triggered due to an I/O error, or an memory
allocation error, or other possible causes. Most errors are caused by
a detected file system inconsistency, so the default code stored in
the superblock will be EXT4_ERR_EFSCORRUPTED.

Link: https://lore.kernel.org/r/20191204032335.7683-1-tytso@mit.eduSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

878520ac

07 11月, 2019 1 次提交

ext4: add support for IV_INO_LBLK_64 encryption policies · b925acb8

由 Eric Biggers 提交于 10月 24, 2019

IV_INO_LBLK_64 encryption policies have special requirements from the
filesystem beyond those of the existing encryption policies:

- Inode numbers must never change, even if the filesystem is resized.
- Inode numbers must be <= 32 bits.
- File logical block numbers must be <= 32 bits.

ext4 has 32-bit inode and file logical block numbers.  However,
resize2fs can re-number inodes when shrinking an ext4 filesystem.

However, typically the people who would want to use this format don't
care about filesystem shrinking.  They'd be fine with a solution that
just prevents the filesystem from being shrunk.

Therefore, add a new feature flag EXT4_FEATURE_COMPAT_STABLE_INODES that
will do exactly that.  Then wire up the fscrypt_operations to expose
this flag to fs/crypto/, so that it allows IV_INO_LBLK_64 policies when
this flag is set.
Acked-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NEric Biggers <ebiggers@google.com>

b925acb8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功