提交 · a8ab6d3885ef5e2300d683b79a9e1999403eefd9 · openeuler / Kernel

29 5月, 2020 2 次提交

fs/ext4: Update ext4_should_use_dax() · a8ab6d38

由 Ira Weiny 提交于 5月 28, 2020

S_DAX should only be enabled when the underlying block device supports
dax.

Cache the underlying support for DAX in the super block and modify
ext4_should_use_dax() to check for device support prior to the over
riding mount option.

While we are at it change the function to ext4_should_enable_dax() as
this better reflects the ask as well as matches xfs.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>

Link: https://lore.kernel.org/r/20200528150003.828793-5-ira.weiny@intel.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

a8ab6d38

fs/ext4: Change EXT4_MOUNT_DAX to EXT4_MOUNT_DAX_ALWAYS · fc626fe3

由 Ira Weiny 提交于 5月 28, 2020

In prep for the new tri-state mount option which then introduces
EXT4_MOUNT_DAX_NEVER.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>

Link: https://lore.kernel.org/r/20200528150003.828793-4-ira.weiny@intel.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

fc626fe3

02 4月, 2020 1 次提交

ext4: save all error info in save_error_info() and drop ext4_set_errno() · 54d3adbc

由 Theodore Ts'o 提交于 3月 28, 2020

Using a separate function, ext4_set_errno() to set the errno is
problematic because it doesn't do the right thing once
s_last_error_errorcode is non-zero.  It's also less racy to set all of
the error information all at once.  (Also, as a bonus, it shrinks code
size slightly.)

Link: https://lore.kernel.org/r/20200329020404.686965-1-tytso@mit.edu
Fixes: 878520ac ("ext4: save the error code which triggered...")
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

54d3adbc

06 3月, 2020 1 次提交

ext4: remove EXT4_EOFBLOCKS_FL and associated code · 4337ecd1

由 Eric Whitney 提交于 2月 11, 2020

The EXT4_EOFBLOCKS_FL inode flag is used to indicate whether a file
contains unwritten blocks past i_size. It's set when ext4_fallocate
is called with the KEEP_SIZE flag to extend a file with an unwritten
extent. However, this flag hasn't been useful functionally since
March, 2012, when a decision was made to remove it from ext4.

All traces of EXT4_EOFBLOCKS_FL were removed from e2fsprogs version
1.42.2 by commit 010dc7b90d97 ("e2fsck: remove EXT4_EOFBLOCKS_FL flag
handling") at that time. Now that enough time has passed to make
e2fsprogs versions containing this modification common, this patch now
removes the code associated with EXT4_EOFBLOCKS_FL from the kernel as
well.

This change has two implications. First, because pre-1.42.2 e2fsck
versions only look for a problem if EXT4_EOFBLOCKS_FL is set, and
because that bit will never be set by newer kernels containing this
patch, old versions of e2fsck won't have a compatibility problem with
files created by newer kernels.

Second, newer kernels will not clear EXT4_EOFBLOCKS_FL inode flag bits
belonging to a file written by an older kernel. If set, it will remain
in that state until the file is deleted. Because e2fsck versions since
1.42.2 don't check the flag at all, no adverse effect is expected.
However, pre-1.42.2 e2fsck versions that do check the flag may report
that it is set when it ought not to be after a file has been truncated
or had its unwritten blocks written. In this case, the old version of
e2fsck will offer to clear the flag. No adverse effect would then
occur whether the user chooses to clear the flag or not.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20200211210216.24960-1-enwlinux@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

4337ecd1

22 2月, 2020 3 次提交

ext4: fix race between writepages and enabling EXT4_EXTENTS_FL · cb85f4d2

由 Eric Biggers 提交于 2月 19, 2020

If EXT4_EXTENTS_FL is set on an inode while ext4_writepages() is running
on it, the following warning in ext4_add_complete_io() can be hit:

WARNING: CPU: 1 PID: 0 at fs/ext4/page-io.c:234 ext4_put_io_end_defer+0xf0/0x120

Here's a minimal reproducer (not 100% reliable) (root isn't required):

        while true; do
                sync
        done &
        while true; do
                rm -f file
                touch file
                chattr -e file
                echo X >> file
                chattr +e file
        done

The problem is that in ext4_writepages(), ext4_should_dioread_nolock()
(which only returns true on extent-based files) is checked once to set
the number of reserved journal credits, and also again later to select
the flags for ext4_map_blocks() and copy the reserved journal handle to
ext4_io_end::handle.  But if EXT4_EXTENTS_FL is being concurrently set,
the first check can see dioread_nolock disabled while the later one can
see it enabled, causing the reserved handle to unexpectedly be NULL.

Since changing EXT4_EXTENTS_FL is uncommon, and there may be other races
related to doing so as well, fix this by synchronizing changing
EXT4_EXTENTS_FL with ext4_writepages() via the existing
s_writepages_rwsem (previously called s_journal_flag_rwsem).

This was originally reported by syzbot without a reproducer at
https://syzkaller.appspot.com/bug?extid=2202a584a00fffd19fbf,
but now that dioread_nolock is the default I also started seeing this
when running syzkaller locally.

Link: https://lore.kernel.org/r/20200219183047.47417-3-ebiggers@kernel.org
Reported-by: syzbot+2202a584a00fffd19fbf@syzkaller.appspotmail.com
Fixes: 6b523df4 ("ext4: use transaction reservation for extent conversion in ext4_end_io")
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@kernel.org

cb85f4d2

ext4: rename s_journal_flag_rwsem to s_writepages_rwsem · bbd55937

由 Eric Biggers 提交于 2月 19, 2020

In preparation for making s_journal_flag_rwsem synchronize
ext4_writepages() with changes to both the EXTENTS and JOURNAL_DATA
flags (rather than just JOURNAL_DATA as it does currently), rename it to
s_writepages_rwsem.

Link: https://lore.kernel.org/r/20200219183047.47417-2-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@kernel.org

bbd55937

ext4: fix potential race between s_flex_groups online resizing and access · 7c990728

由 Suraj Jitindar Singh 提交于 2月 18, 2020

During an online resize an array of s_flex_groups structures gets replaced
so it can get enlarged. If there is a concurrent access to the array and
this memory has been reused then this can lead to an invalid memory access.

The s_flex_group array has been converted into an array of pointers rather
than an array of structures. This is to ensure that the information
contained in the structures cannot get out of sync during a resize due to
an accessor updating the value in the old structure after it has been
copied but before the array pointer is updated. Since the structures them-
selves are no longer copied but only the pointers to them this case is
mitigated.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
Link: https://lore.kernel.org/r/20200221053458.730016-4-tytso@mit.eduSigned-off-by: NSuraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

7c990728

21 2月, 2020 2 次提交

ext4: fix potential race between s_group_info online resizing and access · df3da4ea

由 Suraj Jitindar Singh 提交于 2月 18, 2020

During an online resize an array of pointers to s_group_info gets replaced
so it can get enlarged. If there is a concurrent access to the array in
ext4_get_group_info() and this memory has been reused then this can lead to
an invalid memory access.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
Link: https://lore.kernel.org/r/20200221053458.730016-3-tytso@mit.eduSigned-off-by: NSuraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NBalbir Singh <sblbir@amazon.com>
Cc: stable@kernel.org

df3da4ea

ext4: fix potential race between online resizing and write operations · 1d0c3924

由 Theodore Ts'o 提交于 2月 15, 2020

During an online resize an array of pointers to buffer heads gets
replaced so it can get enlarged.  If there is a racing block
allocation or deallocation which uses the old array, and the old array
has gotten reused this can lead to a GPF or some other random kernel
memory getting modified.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
Link: https://lore.kernel.org/r/20200221053458.730016-2-tytso@mit.eduReported-by: NSuraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

1d0c3924

20 2月, 2020 1 次提交

ext4: fix a data race in EXT4_I(inode)->i_disksize · 35df4299

由 Qian Cai 提交于 2月 07, 2020

EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by
KCSAN,

 BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4]

 write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127:
  ext4_write_end+0x4e3/0x750 [ext4]
  ext4_update_i_disksize at fs/ext4/ext4.h:3032
  (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046
  (inlined by) ext4_write_end at fs/ext4/inode.c:1287
  generic_perform_write+0x208/0x2a0
  ext4_buffered_write_iter+0x11f/0x210 [ext4]
  ext4_file_write_iter+0xce/0x9e0 [ext4]
  new_sync_write+0x29c/0x3b0
  __vfs_write+0x92/0xa0
  vfs_write+0x103/0x260
  ksys_write+0x9d/0x130
  __x64_sys_write+0x4c/0x60
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37:
  ext4_writepages+0x10ac/0x1d00 [ext4]
  mpage_map_and_submit_extent at fs/ext4/inode.c:2468
  (inlined by) ext4_writepages at fs/ext4/inode.c:2772
  do_writepages+0x5e/0x130
  __writeback_single_inode+0xeb/0xb20
  writeback_sb_inodes+0x429/0x900
  __writeback_inodes_wb+0xc4/0x150
  wb_writeback+0x4bd/0x870
  wb_workfn+0x6b4/0x960
  process_one_work+0x54c/0xbe0
  worker_thread+0x80/0x650
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G        W  O L 5.5.0-next-20200204+ #5
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
 Workqueue: writeback wb_workfn (flush-7:0)

Since only the read is operating as lockless (outside of the
"i_data_sem"), load tearing could introduce a logic bug. Fix it by
adding READ_ONCE() for the read and WRITE_ONCE() for the write.
Signed-off-by: NQian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/1581085751-31793-1-git-send-email-cai@lca.pwSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

35df4299

14 2月, 2020 1 次提交

ext4: fix checksum errors with indexed dirs · 48a34311

由 Jan Kara 提交于 2月 10, 2020

DIR_INDEX has been introduced as a compat ext4 feature. That means that
even kernels / tools that don't understand the feature may modify the
filesystem. This works because for kernels not understanding indexed dir
format, internal htree nodes appear just as empty directory entries.
Index dir aware kernels then check the htree structure is still
consistent before using the data. This all worked reasonably well until
metadata checksums were introduced. The problem is that these
effectively made DIR_INDEX only ro-compatible because internal htree
nodes store checksums in a different place than normal directory blocks.
Thus any modification ignorant to DIR_INDEX (or just clearing
EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch
and trigger kernel errors. So we have to be more careful when dealing
with indexed directories on filesystems with checksumming enabled.

1) We just disallow loading any directory inodes with EXT4_INDEX_FL when
DIR_INDEX is not enabled. This is harsh but it should be very rare (it
means someone disabled DIR_INDEX on existing filesystem and didn't run
e2fsck), e2fsck can fix the problem, and we don't want to answer the
difficult question: "Should we rather corrupt the directory more or
should we ignore that DIR_INDEX feature is not set?"

2) When we find out htree structure is corrupted (but the filesystem and
the directory should in support htrees), we continue just ignoring htree
information for reading but we refuse to add new entries to the
directory to avoid corrupting it more.

Link: https://lore.kernel.org/r/20200210144316.22081-1-jack@suse.cz
Fixes: dbe89444 ("ext4: Calculate and verify checksums for htree nodes")
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

48a34311

18 1月, 2020 4 次提交

ext4: drop ext4_kvmalloc() · 71b565ce

由 Theodore Ts'o 提交于 1月 16, 2020

As Jan pointed out[1], as of commit 81378da6 ("jbd2: mark the
transaction context with the scope GFP_NOFS context") we use
memalloc_nofs_{save,restore}() while a jbd2 handle is active.  So
ext4_kvmalloc() so we can call allocate using GFP_NOFS is no longer
necessary.

[1] https://lore.kernel.org/r/20200109100007.GC27035@quack2.suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20200116155031.266620-1-tytso@mit.eduReviewed-by: NJan Kara <jack@suse.cz>

71b565ce

ext4: make some functions static in extents.c · 43f81677

由 Eric Biggers 提交于 12月 31, 2019

Make the following functions static since they're only used in
extents.c:

	__ext4_ext_dirty()
	ext4_can_extents_be_merged()
	ext4_collapse_range()
	ext4_insert_range()

Also remove the prototype for ext4_ext_writepage_trans_blocks(), as this
function is not defined anywhere.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20191231180444.46586-5-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>

43f81677

ext4: remove ext4_{ind,ext}_calc_metadata_amount() · dd6683e6

由 Eric Biggers 提交于 12月 31, 2019

Remove the ext4_ind_calc_metadata_amount() and
ext4_ext_calc_metadata_amount() functions, which have been unused since
commit 71d4f7d0 ("ext4: remove metadata reservation checks").

Also remove the i_da_metadata_calc_last_lblock and
i_da_metadata_calc_len fields from struct ext4_inode_info, as these were
only used by these removed functions.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20191231180444.46586-2-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>

dd6683e6

ext4: Delete ext4_kvzvalloc() · 8f27fd0a

由 Naoto Kobayashi 提交于 12月 27, 2019

Since we're not using ext4_kvzalloc(), delete this function.
Signed-off-by: NNaoto Kobayashi <naoto.kobayashi4c@gmail.com>
Link: https://lore.kernel.org/r/20191227080523.31808-2-naoto.kobayashi4c@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

8f27fd0a

27 12月, 2019 3 次提交

ext4: Optimize ext4 DIO overwrites · 8cd115bd

由 Jan Kara 提交于 12月 18, 2019

Currently we start transaction for mapping every extent for writing
using direct IO. This is unnecessary when we know we are overwriting
already allocated blocks and the overhead of starting a transaction can
be significant especially for multithreaded workloads doing small writes.
Use iomap operations that avoid starting a transaction for direct IO
overwrites.

This improves throughput of 4k random writes - fio jobfile:
[global]
rw=randrw
norandommap=1
invalidate=0
bs=4k
numjobs=16
time_based=1
ramp_time=30
runtime=120
group_reporting=1
ioengine=psync
direct=1
size=16G
filename=file1.0.0:file1.0.1:file1.0.2:file1.0.3:file1.0.4:file1.0.5:file1.0.6:file1.0.7:file1.0.8:file1.0.9:file1.0.10:file1.0.11:file1.0.12:file1.0.13:file1.0.14:file1.0.15:file1.0.16:file1.0.17:file1.0.18:file1.0.19:file1.0.20:file1.0.21:file1.0.22:file1.0.23:file1.0.24:file1.0.25:file1.0.26:file1.0.27:file1.0.28:file1.0.29:file1.0.30:file1.0.31
file_service_type=random
nrfiles=32

from 3018MB/s to 4059MB/s in my test VM running test against simulated
pmem device (note that before iomap conversion, this workload was able
to achieve 3708MB/s because old direct IO path avoided transaction start
for overwrites as well). For dax, the win is even larger improving
throughput from 3042MB/s to 4311MB/s.
Reported-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191218174433.19380-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

8cd115bd

ext4: simulate various I/O and checksum errors when reading metadata · 46f870d6

由 Theodore Ts'o 提交于 11月 21, 2019

This allows us to test various error handling code paths

Link: https://lore.kernel.org/r/20191209012317.59398-1-tytso@mit.eduSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

46f870d6

ext4: save the error code which triggered an ext4_error() in the superblock · 878520ac

由 Theodore Ts'o 提交于 11月 19, 2019

This allows the cause of an ext4_error() report to be categorized
based on whether it was triggered due to an I/O error, or an memory
allocation error, or other possible causes. Most errors are caused by
a detected file system inconsistency, so the default code stored in
the superblock will be EXT4_ERR_EFSCORRUPTED.

Link: https://lore.kernel.org/r/20191204032335.7683-1-tytso@mit.eduSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

878520ac

07 11月, 2019 1 次提交

ext4: add support for IV_INO_LBLK_64 encryption policies · b925acb8

由 Eric Biggers 提交于 10月 24, 2019

IV_INO_LBLK_64 encryption policies have special requirements from the
filesystem beyond those of the existing encryption policies:

- Inode numbers must never change, even if the filesystem is resized.
- Inode numbers must be <= 32 bits.
- File logical block numbers must be <= 32 bits.

ext4 has 32-bit inode and file logical block numbers.  However,
resize2fs can re-number inodes when shrinking an ext4 filesystem.

However, typically the people who would want to use this format don't
care about filesystem shrinking.  They'd be fine with a solution that
just prevents the filesystem from being shrunk.

Therefore, add a new feature flag EXT4_FEATURE_COMPAT_STABLE_INODES that
will do exactly that.  Then wire up the fscrypt_operations to expose
this flag to fs/crypto/, so that it allows IV_INO_LBLK_64 policies when
this flag is set.
Acked-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NEric Biggers <ebiggers@google.com>

b925acb8

06 11月, 2019 4 次提交

ext4: Reserve revoke credits for freed blocks · 83448bdf

由 Jan Kara 提交于 11月 05, 2019

So far we have reserved only relatively high fixed amount of revoke
credits for each transaction. We over-reserved by large amount for most
cases but when freeing large directories or files with data journalling,
the fixed amount is not enough. In fact the worst case estimate is
inconveniently large (maximum extent size) for freeing of one extent.

We fix this by doing proper estimate of the amount of blocks that need
to be revoked when removing blocks from the inode due to truncate or
hole punching and otherwise reserve just a small amount of revoke
credits for each transaction to accommodate freeing of xattrs block or
so.
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-23-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

83448bdf

ext4: Provide function to handle transaction restarts · a4130367

由 Jan Kara 提交于 11月 05, 2019

Provide ext4_journal_ensure_credits_fn() function to ensure transaction
has given amount of credits and call helper function to prepare for
restarting a transaction. This allows to remove some boilerplate code
from various places, add proper error handling for the case where
transaction extension or restart fails, and reduces following changes
needed for proper revoke record reservation tracking.
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-10-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

a4130367

ext4: introduce direct I/O write using iomap infrastructure · 378f32ba

由 Matthew Bobrowski 提交于 11月 05, 2019

This patch introduces a new direct I/O write path which makes use of
the iomap infrastructure.

All direct I/O writes are now passed from the ->write_iter() callback
through to the new direct I/O handler ext4_dio_write_iter(). This
function is responsible for calling into the iomap infrastructure via
iomap_dio_rw().

Code snippets from the existing direct I/O write code within
ext4_file_write_iter() such as, checking whether the I/O request is
unaligned asynchronous I/O, or whether the write will result in an
overwrite have effectively been moved out and into the new direct I/O
->write_iter() handler.
The block mapping flags that are eventually passed down to
ext4_map_blocks() from the *_get_block_*() suite of routines have been
taken out and introduced within ext4_iomap_alloc().

For inode extension cases, ext4_handle_inode_extension() is
effectively the function responsible for performing such metadata
updates. This is called after iomap_dio_rw() has returned so that we
can safely determine whether we need to potentially truncate any
allocated blocks that may have been prepared for this direct I/O
write. We don't perform the inode extension, or truncate operations
from the ->end_io() handler as we don't have the original I/O 'length'
available there. The ->end_io() however is responsible fo converting
allocated unwritten extents to written extents.

In the instance of a short write, we fallback and complete the
remainder of the I/O using buffered I/O via
ext4_buffered_write_iter().

The existing buffer_head direct I/O implementation has been removed as
it's now redundant.

[ Fix up ext4_dio_write_iter() per Jan's comments at
  https://lore.kernel.org/r/20191105135932.GN22379@quack2.suse.cz -- TYT ]
Signed-off-by: NMatthew Bobrowski <mbobrowski@mbobrowski.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/e55db6f12ae6ff017f36774135e79f3e7b0333da.1572949325.git.mbobrowski@mbobrowski.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

378f32ba

ext4: introduce new callback for IOMAP_REPORT · 09edf4d3

由 Matthew Bobrowski 提交于 11月 05, 2019

As part of the ext4_iomap_begin() cleanups that precede this patch, we
also split up the IOMAP_REPORT branch into a completely separate
->iomap_begin() callback named ext4_iomap_begin_report(). Again, the
raionale for this change is to reduce the overall clutter within
ext4_iomap_begin().
Signed-off-by: NMatthew Bobrowski <mbobrowski@mbobrowski.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/5c97a569e26ddb6696e3d3ac9fbde41317e029a0.1572949325.git.mbobrowski@mbobrowski.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

09edf4d3

23 10月, 2019 2 次提交

ext4: Add support for blocksize < pagesize in dioread_nolock · c8cc8816

由 Ritesh Harjani 提交于 10月 16, 2019

This patch adds the support for blocksize < pagesize for
dioread_nolock feature.

Since in case of blocksize < pagesize, we can have multiple
small buffers of page as unwritten extents, we need to
maintain a vector of these unwritten extents which needs
the conversion after the IO is complete. Thus, we maintain
a list of tuple <offset, size> pair (io_end_vec) for this &
traverse this list to do the unwritten to written conversion.
Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20191016073711.4141-5-riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

c8cc8816

ext4: Add API to bring in support for unwritten io_end_vec conversion · a00713ea

由 Ritesh Harjani 提交于 10月 16, 2019

This patch just brings in the API for conversion of unwritten io_end_vec
extents which will be required for blocksize < pagesize support
for dioread_nolock feature.

No functional changes in this patch.
Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20191016073711.4141-3-riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

a00713ea

05 9月, 2019 1 次提交

ext4: Reduce ext4 timestamp warnings · cba465b4

由 Deepa Dinamani 提交于 9月 04, 2019

When ext4 file systems were created intentionally with 128 byte inodes,
the rate-limited warning of eventual possible timestamp overflow are
still emitted rather frequently.  Remove the warning for now.

Discussion for whether any warning is needed,
and where it should be emitted, can be found at
https://lore.kernel.org/lkml/1567523922.5576.57.camel@lca.pw/.
I can post a separate follow-up patch after the conclusion.
Reported-by: NQian Cai <cai@lca.pw>
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

cba465b4

30 8月, 2019 1 次提交

ext4: Initialize timestamps limits · 4881c497

由 Deepa Dinamani 提交于 1月 21, 2018

ext4 has different overflow limits for max filesystem
timestamps based on the extra bytes available.

The timestamp limits are calculated according to the
encoding table in
a4dad1aei(ext4: Fix handling of extended tv_sec):

* extra  msb of                         adjust for signed
* epoch  32-bit                         32-bit tv_sec to
* bits   time    decoded 64-bit tv_sec  64-bit tv_sec      valid time range
* 0 0    1    -0x80000000..-0x00000001  0x000000000   1901-12-13..1969-12-31
* 0 0    0    0x000000000..0x07fffffff  0x000000000   1970-01-01..2038-01-19
* 0 1    1    0x080000000..0x0ffffffff  0x100000000   2038-01-19..2106-02-07
* 0 1    0    0x100000000..0x17fffffff  0x100000000   2106-02-07..2174-02-25
* 1 0    1    0x180000000..0x1ffffffff  0x200000000   2174-02-25..2242-03-16
* 1 0    0    0x200000000..0x27fffffff  0x200000000   2242-03-16..2310-04-04
* 1 1    1    0x280000000..0x2ffffffff  0x300000000   2310-04-04..2378-04-22
* 1 1    0    0x300000000..0x37fffffff  0x300000000   2378-04-22..2446-05-10

Note that the time limits are not correct for deletion times.

Added a warn when an inode cannot be extended to incorporate an
extended timestamp.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Acked-by: NJeff Layton <jlayton@kernel.org>
Cc: tytso@mit.edu
Cc: adilger.kernel@dilger.ca
Cc: linux-ext4@vger.kernel.org

4881c497

28 8月, 2019 1 次提交

ext4: fix potential use after free after remounting with noblock_validity · 7727ae52

由 zhangyi (F) 提交于 8月 28, 2019

Remount process will release system zone which was allocated before if
"noblock_validity" is specified. If we mount an ext4 file system to two
mountpoints with default mount options, and then remount one of them
with "noblock_validity", it may trigger a use after free problem when
someone accessing the other one.

 # mount /dev/sda foo
 # mount /dev/sda bar

User access mountpoint "foo"   |   Remount mountpoint "bar"
                               |
ext4_map_blocks()              |   ext4_remount()
check_block_validity()         |   ext4_setup_system_zone()
ext4_data_block_valid()        |   ext4_release_system_zone()
                               |   free system_blks rb nodes
access system_blks rb nodes    |
trigger use after free         |

This problem can also be reproduced by one mountpint, At the same time,
add_system_zone() can get called during remount as well so there can be
racing ext4_data_block_valid() reading the rbtree at the same time.

This patch add RCU to protect system zone from releasing or building
when doing a remount which inverse current "noblock_validity" mount
option. It assign the rbtree after the whole tree was complete and
do actual freeing after rcu grace period, avoid any intermediate state.

Reported-by: syzbot+1e470567330b7ad711d5@syzkaller.appspotmail.com
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

7727ae52

23 8月, 2019 2 次提交

ext4: rework reserved cluster accounting when invalidating pages · 8fcc3a58

由 Eric Whitney 提交于 8月 22, 2019

The goal of this patch is to remove two references to the buffer delay
bit in ext4_da_page_release_reservation() as part of a larger effort
to remove all such references from ext4.  These two references are
principally used to reduce the reserved block/cluster count when pages
are invalidated as a result of truncating, punching holes, or
collapsing a block range in a file.  The entire function is removed
and replaced with code in ext4_es_remove_extent() that reduces the
reserved count as a side effect of removing a block range from delayed
and not unwritten extents in the extent status tree as is done when
truncating, punching holes, or collapsing ranges.

The code is written to minimize the number of searches descending from
rb tree roots for scalability.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

8fcc3a58

ext4: treat buffers with write errors as containing valid data · 7963e5ac

由 ZhangXiaoxu 提交于 8月 22, 2019

I got some errors when I repair an ext4 volume which stacked by an
iscsi target:
    Entry 'test60' in / (2) has deleted/unused inode 73750.  Clear?
It can be reproduced when the network not good enough.

When I debug this I found ext4 will read entry buffer from disk and
the buffer is marked with write_io_error.

If the buffer is marked with write_io_error, it means it already
wroten to journal, and not checked out to disk. IOW, the journal
is newer than the data in disk.
If this journal record 'delete test60', it means the 'test60' still
on the disk metadata.

In this case, if we read the buffer from disk successfully and create
file continue, the new journal record will overwrite the journal
which record 'delete test60', then the entry corruptioned.

So, use the buffer rather than read from disk if the buffer is marked
with write_io_error.
Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

7963e5ac

13 8月, 2019 3 次提交

ext4: add fs-verity read support · 22cfe4b4

由 Eric Biggers 提交于 7月 22, 2019

Make ext4_mpage_readpages() verify data as it is read from fs-verity
files, using the helper functions from fs/verity/.

To support both encryption and verity simultaneously, this required
refactoring the decryption workflow into a generic "post-read
processing" workflow which can do decryption, verification, or both.

The case where the ext4 block size is not equal to the PAGE_SIZE is not
supported yet, since in that case ext4_mpage_readpages() sometimes falls
back to block_read_full_page(), which does not support fs-verity yet.
Co-developed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NEric Biggers <ebiggers@google.com>

22cfe4b4

ext4: add basic fs-verity support · c93d8f88

由 Eric Biggers 提交于 7月 22, 2019

Add most of fs-verity support to ext4.  fs-verity is a filesystem
feature that enables transparent integrity protection and authentication
of read-only files.  It uses a dm-verity like mechanism at the file
level: a Merkle tree is used to verify any block in the file in
log(filesize) time.  It is implemented mainly by helper functions in
fs/verity/.  See Documentation/filesystems/fsverity.rst for the full
documentation.

This commit adds all of ext4 fs-verity support except for the actual
data verification, including:

- Adding a filesystem feature flag and an inode flag for fs-verity.

- Implementing the fsverity_operations to support enabling verity on an
  inode and reading/writing the verity metadata.

- Updating ->write_begin(), ->write_end(), and ->writepages() to support
  writing verity metadata pages.

- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().

ext4 stores the verity metadata (Merkle tree and fsverity_descriptor)
past the end of the file, starting at the first 64K boundary beyond
i_size.  This approach works because (a) verity files are readonly, and
(b) pages fully beyond i_size aren't visible to userspace but can be
read/written internally by ext4 with only some relatively small changes
to ext4.  This approach avoids having to depend on the EA_INODE feature
and on rearchitecturing ext4's xattr support to support paging
multi-gigabyte xattrs into memory, and to support encrypting xattrs.
Note that the verity metadata *must* be encrypted when the file is,
since it contains hashes of the plaintext data.

This patch incorporates work by Theodore Ts'o and Chandan Rajendra.
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NEric Biggers <ebiggers@google.com>

c93d8f88

ext4: drop legacy pre-1970 encoding workaround · cd2d9922

由 Theodore Ts'o 提交于 8月 12, 2019

Originally, support for expanded timestamps had a bug in that pre-1970
times were erroneously encoded as being in the the 24th century. This
was fixed in commit a4dad1ae ("ext4: Fix handling of extended
tv_sec") which landed in 4.4. Starting with 4.4, pre-1970 timestamps
were correctly encoded, but for backwards compatibility those
incorrectly encoded timestamps were mapped back to the pre-1970 dates.

Given that backwards compatibility workaround has been around for 4
years, and given that running e2fsck from e2fsprogs 1.43.2 and later
will offer to fix these timestamps (which has been released for 3
years), it's past time to drop the legacy workaround from the kernel.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

cd2d9922

12 8月, 2019 3 次提交

ext4: add new ioctl EXT4_IOC_GET_ES_CACHE · bb5835ed

由 Theodore Ts'o 提交于 8月 11, 2019

For debugging reasons, it's useful to know the contents of the extent
cache. Since the extent cache contains much of what is in the fiemap
ioctl, use an fiemap-style interface to return this information.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

bb5835ed

ext4: add a new ioctl EXT4_IOC_GETSTATE · 1ad3ea6e

由 Theodore Ts'o 提交于 8月 11, 2019

The new ioctl EXT4_IOC_GETSTATE returns some of the dynamic state of
an ext4 inode for debugging purposes.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

1ad3ea6e

ext4: add a new ioctl EXT4_IOC_CLEAR_ES_CACHE · b0c013e2

由 Theodore Ts'o 提交于 8月 11, 2019

The new ioctl EXT4_IOC_CLEAR_ES_CACHE will force an inode's extent
status cache to be cleared out.  This is intended for use for
debugging.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b0c013e2

22 6月, 2019 3 次提交

T
ext4: rename htree_inline_dir_to_tree() to ext4_inlinedir_to_tree() · 7633b08b
由 Theodore Ts'o 提交于 6月 21, 2019
```
Clean up namespace pollution by the inline_data code.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
7633b08b

ext4: refactor initialize_dirent_tail() · ddce3b94

由 Theodore Ts'o 提交于 6月 21, 2019

Move the calculation of the location of the dirent tail into
initialize_dirent_tail().  Also prefix the function with ext4_ to fix
kernel namepsace polution.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ddce3b94

ext4: rename "dirent_csum" functions to use "dirblock" · f036adb3

由 Theodore Ts'o 提交于 6月 21, 2019

Functions such as ext4_dirent_csum_verify() and ext4_dirent_csum_set()
don't actually operate on a directory entry, but a directory block.
And while they take a struct ext4_dir_entry *dirent as an argument, it
had better be the first directory at the beginning of the direct
block, or things will go very wrong.

Rename the following functions so that things make more sense, and
remove a lot of confusing casts along the way:

   ext4_dirent_csum_verify	 -> ext4_dirblock_csum_verify
   ext4_dirent_csum_set		 -> ext4_dirblock_csum_set
   ext4_dirent_csum		 -> ext4_dirblock_csum
   ext4_handle_dirty_dirent_node -> ext4_handle_dirty_dirblock
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

f036adb3

20 6月, 2019 1 次提交

ext4: optimize case-insensitive lookups · 3ae72562

由 Gabriel Krisman Bertazi 提交于 6月 19, 2019

Temporarily cache a casefolded version of the file name under lookup in
ext4_filename, to avoid repeatedly casefolding it.  I got up to 30%
speedup on lookups of large directories (>100k entries), depending on
the length of the string under lookup.
Signed-off-by: NGabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

3ae72562

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功