- 29 5月, 2020 2 次提交
-
-
由 Ira Weiny 提交于
S_DAX should only be enabled when the underlying block device supports dax. Cache the underlying support for DAX in the super block and modify ext4_should_use_dax() to check for device support prior to the over riding mount option. While we are at it change the function to ext4_should_enable_dax() as this better reflects the ask as well as matches xfs. Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NIra Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20200528150003.828793-5-ira.weiny@intel.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Ira Weiny 提交于
In prep for the new tri-state mount option which then introduces EXT4_MOUNT_DAX_NEVER. Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NIra Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20200528150003.828793-4-ira.weiny@intel.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 02 4月, 2020 1 次提交
-
-
由 Theodore Ts'o 提交于
Using a separate function, ext4_set_errno() to set the errno is problematic because it doesn't do the right thing once s_last_error_errorcode is non-zero. It's also less racy to set all of the error information all at once. (Also, as a bonus, it shrinks code size slightly.) Link: https://lore.kernel.org/r/20200329020404.686965-1-tytso@mit.edu Fixes: 878520ac ("ext4: save the error code which triggered...") Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 06 3月, 2020 1 次提交
-
-
由 Eric Whitney 提交于
The EXT4_EOFBLOCKS_FL inode flag is used to indicate whether a file contains unwritten blocks past i_size. It's set when ext4_fallocate is called with the KEEP_SIZE flag to extend a file with an unwritten extent. However, this flag hasn't been useful functionally since March, 2012, when a decision was made to remove it from ext4. All traces of EXT4_EOFBLOCKS_FL were removed from e2fsprogs version 1.42.2 by commit 010dc7b90d97 ("e2fsck: remove EXT4_EOFBLOCKS_FL flag handling") at that time. Now that enough time has passed to make e2fsprogs versions containing this modification common, this patch now removes the code associated with EXT4_EOFBLOCKS_FL from the kernel as well. This change has two implications. First, because pre-1.42.2 e2fsck versions only look for a problem if EXT4_EOFBLOCKS_FL is set, and because that bit will never be set by newer kernels containing this patch, old versions of e2fsck won't have a compatibility problem with files created by newer kernels. Second, newer kernels will not clear EXT4_EOFBLOCKS_FL inode flag bits belonging to a file written by an older kernel. If set, it will remain in that state until the file is deleted. Because e2fsck versions since 1.42.2 don't check the flag at all, no adverse effect is expected. However, pre-1.42.2 e2fsck versions that do check the flag may report that it is set when it ought not to be after a file has been truncated or had its unwritten blocks written. In this case, the old version of e2fsck will offer to clear the flag. No adverse effect would then occur whether the user chooses to clear the flag or not. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20200211210216.24960-1-enwlinux@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 22 2月, 2020 3 次提交
-
-
由 Eric Biggers 提交于
If EXT4_EXTENTS_FL is set on an inode while ext4_writepages() is running on it, the following warning in ext4_add_complete_io() can be hit: WARNING: CPU: 1 PID: 0 at fs/ext4/page-io.c:234 ext4_put_io_end_defer+0xf0/0x120 Here's a minimal reproducer (not 100% reliable) (root isn't required): while true; do sync done & while true; do rm -f file touch file chattr -e file echo X >> file chattr +e file done The problem is that in ext4_writepages(), ext4_should_dioread_nolock() (which only returns true on extent-based files) is checked once to set the number of reserved journal credits, and also again later to select the flags for ext4_map_blocks() and copy the reserved journal handle to ext4_io_end::handle. But if EXT4_EXTENTS_FL is being concurrently set, the first check can see dioread_nolock disabled while the later one can see it enabled, causing the reserved handle to unexpectedly be NULL. Since changing EXT4_EXTENTS_FL is uncommon, and there may be other races related to doing so as well, fix this by synchronizing changing EXT4_EXTENTS_FL with ext4_writepages() via the existing s_writepages_rwsem (previously called s_journal_flag_rwsem). This was originally reported by syzbot without a reproducer at https://syzkaller.appspot.com/bug?extid=2202a584a00fffd19fbf, but now that dioread_nolock is the default I also started seeing this when running syzkaller locally. Link: https://lore.kernel.org/r/20200219183047.47417-3-ebiggers@kernel.org Reported-by: syzbot+2202a584a00fffd19fbf@syzkaller.appspotmail.com Fixes: 6b523df4 ("ext4: use transaction reservation for extent conversion in ext4_end_io") Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> Cc: stable@kernel.org
-
由 Eric Biggers 提交于
In preparation for making s_journal_flag_rwsem synchronize ext4_writepages() with changes to both the EXTENTS and JOURNAL_DATA flags (rather than just JOURNAL_DATA as it does currently), rename it to s_writepages_rwsem. Link: https://lore.kernel.org/r/20200219183047.47417-2-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> Cc: stable@kernel.org
-
由 Suraj Jitindar Singh 提交于
During an online resize an array of s_flex_groups structures gets replaced so it can get enlarged. If there is a concurrent access to the array and this memory has been reused then this can lead to an invalid memory access. The s_flex_group array has been converted into an array of pointers rather than an array of structures. This is to ensure that the information contained in the structures cannot get out of sync during a resize due to an accessor updating the value in the old structure after it has been copied but before the array pointer is updated. Since the structures them- selves are no longer copied but only the pointers to them this case is mitigated. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443 Link: https://lore.kernel.org/r/20200221053458.730016-4-tytso@mit.eduSigned-off-by: NSuraj Jitindar Singh <surajjs@amazon.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
- 21 2月, 2020 2 次提交
-
-
由 Suraj Jitindar Singh 提交于
During an online resize an array of pointers to s_group_info gets replaced so it can get enlarged. If there is a concurrent access to the array in ext4_get_group_info() and this memory has been reused then this can lead to an invalid memory access. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443 Link: https://lore.kernel.org/r/20200221053458.730016-3-tytso@mit.eduSigned-off-by: NSuraj Jitindar Singh <surajjs@amazon.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NBalbir Singh <sblbir@amazon.com> Cc: stable@kernel.org
-
由 Theodore Ts'o 提交于
During an online resize an array of pointers to buffer heads gets replaced so it can get enlarged. If there is a racing block allocation or deallocation which uses the old array, and the old array has gotten reused this can lead to a GPF or some other random kernel memory getting modified. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443 Link: https://lore.kernel.org/r/20200221053458.730016-2-tytso@mit.eduReported-by: NSuraj Jitindar Singh <surajjs@amazon.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
- 20 2月, 2020 1 次提交
-
-
由 Qian Cai 提交于
EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by KCSAN, BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4] write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127: ext4_write_end+0x4e3/0x750 [ext4] ext4_update_i_disksize at fs/ext4/ext4.h:3032 (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046 (inlined by) ext4_write_end at fs/ext4/inode.c:1287 generic_perform_write+0x208/0x2a0 ext4_buffered_write_iter+0x11f/0x210 [ext4] ext4_file_write_iter+0xce/0x9e0 [ext4] new_sync_write+0x29c/0x3b0 __vfs_write+0x92/0xa0 vfs_write+0x103/0x260 ksys_write+0x9d/0x130 __x64_sys_write+0x4c/0x60 do_syscall_64+0x91/0xb47 entry_SYSCALL_64_after_hwframe+0x49/0xbe read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37: ext4_writepages+0x10ac/0x1d00 [ext4] mpage_map_and_submit_extent at fs/ext4/inode.c:2468 (inlined by) ext4_writepages at fs/ext4/inode.c:2772 do_writepages+0x5e/0x130 __writeback_single_inode+0xeb/0xb20 writeback_sb_inodes+0x429/0x900 __writeback_inodes_wb+0xc4/0x150 wb_writeback+0x4bd/0x870 wb_workfn+0x6b4/0x960 process_one_work+0x54c/0xbe0 worker_thread+0x80/0x650 kthread+0x1e0/0x200 ret_from_fork+0x27/0x50 Reported by Kernel Concurrency Sanitizer on: CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G W O L 5.5.0-next-20200204+ #5 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 Workqueue: writeback wb_workfn (flush-7:0) Since only the read is operating as lockless (outside of the "i_data_sem"), load tearing could introduce a logic bug. Fix it by adding READ_ONCE() for the read and WRITE_ONCE() for the write. Signed-off-by: NQian Cai <cai@lca.pw> Link: https://lore.kernel.org/r/1581085751-31793-1-git-send-email-cai@lca.pwSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
- 14 2月, 2020 1 次提交
-
-
由 Jan Kara 提交于
DIR_INDEX has been introduced as a compat ext4 feature. That means that even kernels / tools that don't understand the feature may modify the filesystem. This works because for kernels not understanding indexed dir format, internal htree nodes appear just as empty directory entries. Index dir aware kernels then check the htree structure is still consistent before using the data. This all worked reasonably well until metadata checksums were introduced. The problem is that these effectively made DIR_INDEX only ro-compatible because internal htree nodes store checksums in a different place than normal directory blocks. Thus any modification ignorant to DIR_INDEX (or just clearing EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch and trigger kernel errors. So we have to be more careful when dealing with indexed directories on filesystems with checksumming enabled. 1) We just disallow loading any directory inodes with EXT4_INDEX_FL when DIR_INDEX is not enabled. This is harsh but it should be very rare (it means someone disabled DIR_INDEX on existing filesystem and didn't run e2fsck), e2fsck can fix the problem, and we don't want to answer the difficult question: "Should we rather corrupt the directory more or should we ignore that DIR_INDEX feature is not set?" 2) When we find out htree structure is corrupted (but the filesystem and the directory should in support htrees), we continue just ignoring htree information for reading but we refuse to add new entries to the directory to avoid corrupting it more. Link: https://lore.kernel.org/r/20200210144316.22081-1-jack@suse.cz Fixes: dbe89444 ("ext4: Calculate and verify checksums for htree nodes") Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
- 18 1月, 2020 4 次提交
-
-
由 Theodore Ts'o 提交于
As Jan pointed out[1], as of commit 81378da6 ("jbd2: mark the transaction context with the scope GFP_NOFS context") we use memalloc_nofs_{save,restore}() while a jbd2 handle is active. So ext4_kvmalloc() so we can call allocate using GFP_NOFS is no longer necessary. [1] https://lore.kernel.org/r/20200109100007.GC27035@quack2.suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20200116155031.266620-1-tytso@mit.eduReviewed-by: NJan Kara <jack@suse.cz>
-
由 Eric Biggers 提交于
Make the following functions static since they're only used in extents.c: __ext4_ext_dirty() ext4_can_extents_be_merged() ext4_collapse_range() ext4_insert_range() Also remove the prototype for ext4_ext_writepage_trans_blocks(), as this function is not defined anywhere. Signed-off-by: NEric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20191231180444.46586-5-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com> Reviewed-by: NJan Kara <jack@suse.cz>
-
由 Eric Biggers 提交于
Remove the ext4_ind_calc_metadata_amount() and ext4_ext_calc_metadata_amount() functions, which have been unused since commit 71d4f7d0 ("ext4: remove metadata reservation checks"). Also remove the i_da_metadata_calc_last_lblock and i_da_metadata_calc_len fields from struct ext4_inode_info, as these were only used by these removed functions. Signed-off-by: NEric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20191231180444.46586-2-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com> Reviewed-by: NJan Kara <jack@suse.cz>
-
由 Naoto Kobayashi 提交于
Since we're not using ext4_kvzalloc(), delete this function. Signed-off-by: NNaoto Kobayashi <naoto.kobayashi4c@gmail.com> Link: https://lore.kernel.org/r/20191227080523.31808-2-naoto.kobayashi4c@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 27 12月, 2019 3 次提交
-
-
由 Jan Kara 提交于
Currently we start transaction for mapping every extent for writing using direct IO. This is unnecessary when we know we are overwriting already allocated blocks and the overhead of starting a transaction can be significant especially for multithreaded workloads doing small writes. Use iomap operations that avoid starting a transaction for direct IO overwrites. This improves throughput of 4k random writes - fio jobfile: [global] rw=randrw norandommap=1 invalidate=0 bs=4k numjobs=16 time_based=1 ramp_time=30 runtime=120 group_reporting=1 ioengine=psync direct=1 size=16G filename=file1.0.0:file1.0.1:file1.0.2:file1.0.3:file1.0.4:file1.0.5:file1.0.6:file1.0.7:file1.0.8:file1.0.9:file1.0.10:file1.0.11:file1.0.12:file1.0.13:file1.0.14:file1.0.15:file1.0.16:file1.0.17:file1.0.18:file1.0.19:file1.0.20:file1.0.21:file1.0.22:file1.0.23:file1.0.24:file1.0.25:file1.0.26:file1.0.27:file1.0.28:file1.0.29:file1.0.30:file1.0.31 file_service_type=random nrfiles=32 from 3018MB/s to 4059MB/s in my test VM running test against simulated pmem device (note that before iomap conversion, this workload was able to achieve 3708MB/s because old direct IO path avoided transaction start for overwrites as well). For dax, the win is even larger improving throughput from 3042MB/s to 4311MB/s. Reported-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191218174433.19380-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This allows us to test various error handling code paths Link: https://lore.kernel.org/r/20191209012317.59398-1-tytso@mit.eduSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This allows the cause of an ext4_error() report to be categorized based on whether it was triggered due to an I/O error, or an memory allocation error, or other possible causes. Most errors are caused by a detected file system inconsistency, so the default code stored in the superblock will be EXT4_ERR_EFSCORRUPTED. Link: https://lore.kernel.org/r/20191204032335.7683-1-tytso@mit.eduSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 07 11月, 2019 1 次提交
-
-
由 Eric Biggers 提交于
IV_INO_LBLK_64 encryption policies have special requirements from the filesystem beyond those of the existing encryption policies: - Inode numbers must never change, even if the filesystem is resized. - Inode numbers must be <= 32 bits. - File logical block numbers must be <= 32 bits. ext4 has 32-bit inode and file logical block numbers. However, resize2fs can re-number inodes when shrinking an ext4 filesystem. However, typically the people who would want to use this format don't care about filesystem shrinking. They'd be fine with a solution that just prevents the filesystem from being shrunk. Therefore, add a new feature flag EXT4_FEATURE_COMPAT_STABLE_INODES that will do exactly that. Then wire up the fscrypt_operations to expose this flag to fs/crypto/, so that it allows IV_INO_LBLK_64 policies when this flag is set. Acked-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NEric Biggers <ebiggers@google.com>
-
- 06 11月, 2019 4 次提交
-
-
由 Jan Kara 提交于
So far we have reserved only relatively high fixed amount of revoke credits for each transaction. We over-reserved by large amount for most cases but when freeing large directories or files with data journalling, the fixed amount is not enough. In fact the worst case estimate is inconveniently large (maximum extent size) for freeing of one extent. We fix this by doing proper estimate of the amount of blocks that need to be revoked when removing blocks from the inode due to truncate or hole punching and otherwise reserve just a small amount of revoke credits for each transaction to accommodate freeing of xattrs block or so. Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-23-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
Provide ext4_journal_ensure_credits_fn() function to ensure transaction has given amount of credits and call helper function to prepare for restarting a transaction. This allows to remove some boilerplate code from various places, add proper error handling for the case where transaction extension or restart fails, and reduces following changes needed for proper revoke record reservation tracking. Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-10-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Matthew Bobrowski 提交于
This patch introduces a new direct I/O write path which makes use of the iomap infrastructure. All direct I/O writes are now passed from the ->write_iter() callback through to the new direct I/O handler ext4_dio_write_iter(). This function is responsible for calling into the iomap infrastructure via iomap_dio_rw(). Code snippets from the existing direct I/O write code within ext4_file_write_iter() such as, checking whether the I/O request is unaligned asynchronous I/O, or whether the write will result in an overwrite have effectively been moved out and into the new direct I/O ->write_iter() handler. The block mapping flags that are eventually passed down to ext4_map_blocks() from the *_get_block_*() suite of routines have been taken out and introduced within ext4_iomap_alloc(). For inode extension cases, ext4_handle_inode_extension() is effectively the function responsible for performing such metadata updates. This is called after iomap_dio_rw() has returned so that we can safely determine whether we need to potentially truncate any allocated blocks that may have been prepared for this direct I/O write. We don't perform the inode extension, or truncate operations from the ->end_io() handler as we don't have the original I/O 'length' available there. The ->end_io() however is responsible fo converting allocated unwritten extents to written extents. In the instance of a short write, we fallback and complete the remainder of the I/O using buffered I/O via ext4_buffered_write_iter(). The existing buffer_head direct I/O implementation has been removed as it's now redundant. [ Fix up ext4_dio_write_iter() per Jan's comments at https://lore.kernel.org/r/20191105135932.GN22379@quack2.suse.cz -- TYT ] Signed-off-by: NMatthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/e55db6f12ae6ff017f36774135e79f3e7b0333da.1572949325.git.mbobrowski@mbobrowski.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Matthew Bobrowski 提交于
As part of the ext4_iomap_begin() cleanups that precede this patch, we also split up the IOMAP_REPORT branch into a completely separate ->iomap_begin() callback named ext4_iomap_begin_report(). Again, the raionale for this change is to reduce the overall clutter within ext4_iomap_begin(). Signed-off-by: NMatthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/5c97a569e26ddb6696e3d3ac9fbde41317e029a0.1572949325.git.mbobrowski@mbobrowski.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 23 10月, 2019 2 次提交
-
-
由 Ritesh Harjani 提交于
This patch adds the support for blocksize < pagesize for dioread_nolock feature. Since in case of blocksize < pagesize, we can have multiple small buffers of page as unwritten extents, we need to maintain a vector of these unwritten extents which needs the conversion after the IO is complete. Thus, we maintain a list of tuple <offset, size> pair (io_end_vec) for this & traverse this list to do the unwritten to written conversion. Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20191016073711.4141-5-riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Ritesh Harjani 提交于
This patch just brings in the API for conversion of unwritten io_end_vec extents which will be required for blocksize < pagesize support for dioread_nolock feature. No functional changes in this patch. Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20191016073711.4141-3-riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 05 9月, 2019 1 次提交
-
-
由 Deepa Dinamani 提交于
When ext4 file systems were created intentionally with 128 byte inodes, the rate-limited warning of eventual possible timestamp overflow are still emitted rather frequently. Remove the warning for now. Discussion for whether any warning is needed, and where it should be emitted, can be found at https://lore.kernel.org/lkml/1567523922.5576.57.camel@lca.pw/. I can post a separate follow-up patch after the conclusion. Reported-by: NQian Cai <cai@lca.pw> Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Signed-off-by: NArnd Bergmann <arnd@arndb.de>
-
- 30 8月, 2019 1 次提交
-
-
由 Deepa Dinamani 提交于
ext4 has different overflow limits for max filesystem timestamps based on the extra bytes available. The timestamp limits are calculated according to the encoding table in a4dad1aei(ext4: Fix handling of extended tv_sec): * extra msb of adjust for signed * epoch 32-bit 32-bit tv_sec to * bits time decoded 64-bit tv_sec 64-bit tv_sec valid time range * 0 0 1 -0x80000000..-0x00000001 0x000000000 1901-12-13..1969-12-31 * 0 0 0 0x000000000..0x07fffffff 0x000000000 1970-01-01..2038-01-19 * 0 1 1 0x080000000..0x0ffffffff 0x100000000 2038-01-19..2106-02-07 * 0 1 0 0x100000000..0x17fffffff 0x100000000 2106-02-07..2174-02-25 * 1 0 1 0x180000000..0x1ffffffff 0x200000000 2174-02-25..2242-03-16 * 1 0 0 0x200000000..0x27fffffff 0x200000000 2242-03-16..2310-04-04 * 1 1 1 0x280000000..0x2ffffffff 0x300000000 2310-04-04..2378-04-22 * 1 1 0 0x300000000..0x37fffffff 0x300000000 2378-04-22..2446-05-10 Note that the time limits are not correct for deletion times. Added a warn when an inode cannot be extended to incorporate an extended timestamp. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Acked-by: NJeff Layton <jlayton@kernel.org> Cc: tytso@mit.edu Cc: adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org
-
- 28 8月, 2019 1 次提交
-
-
由 zhangyi (F) 提交于
Remount process will release system zone which was allocated before if "noblock_validity" is specified. If we mount an ext4 file system to two mountpoints with default mount options, and then remount one of them with "noblock_validity", it may trigger a use after free problem when someone accessing the other one. # mount /dev/sda foo # mount /dev/sda bar User access mountpoint "foo" | Remount mountpoint "bar" | ext4_map_blocks() | ext4_remount() check_block_validity() | ext4_setup_system_zone() ext4_data_block_valid() | ext4_release_system_zone() | free system_blks rb nodes access system_blks rb nodes | trigger use after free | This problem can also be reproduced by one mountpint, At the same time, add_system_zone() can get called during remount as well so there can be racing ext4_data_block_valid() reading the rbtree at the same time. This patch add RCU to protect system zone from releasing or building when doing a remount which inverse current "noblock_validity" mount option. It assign the rbtree after the whole tree was complete and do actual freeing after rcu grace period, avoid any intermediate state. Reported-by: syzbot+1e470567330b7ad711d5@syzkaller.appspotmail.com Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 23 8月, 2019 2 次提交
-
-
由 Eric Whitney 提交于
The goal of this patch is to remove two references to the buffer delay bit in ext4_da_page_release_reservation() as part of a larger effort to remove all such references from ext4. These two references are principally used to reduce the reserved block/cluster count when pages are invalidated as a result of truncating, punching holes, or collapsing a block range in a file. The entire function is removed and replaced with code in ext4_es_remove_extent() that reduces the reserved count as a side effect of removing a block range from delayed and not unwritten extents in the extent status tree as is done when truncating, punching holes, or collapsing ranges. The code is written to minimize the number of searches descending from rb tree roots for scalability. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 ZhangXiaoxu 提交于
I got some errors when I repair an ext4 volume which stacked by an iscsi target: Entry 'test60' in / (2) has deleted/unused inode 73750. Clear? It can be reproduced when the network not good enough. When I debug this I found ext4 will read entry buffer from disk and the buffer is marked with write_io_error. If the buffer is marked with write_io_error, it means it already wroten to journal, and not checked out to disk. IOW, the journal is newer than the data in disk. If this journal record 'delete test60', it means the 'test60' still on the disk metadata. In this case, if we read the buffer from disk successfully and create file continue, the new journal record will overwrite the journal which record 'delete test60', then the entry corruptioned. So, use the buffer rather than read from disk if the buffer is marked with write_io_error. Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 13 8月, 2019 3 次提交
-
-
由 Eric Biggers 提交于
Make ext4_mpage_readpages() verify data as it is read from fs-verity files, using the helper functions from fs/verity/. To support both encryption and verity simultaneously, this required refactoring the decryption workflow into a generic "post-read processing" workflow which can do decryption, verification, or both. The case where the ext4 block size is not equal to the PAGE_SIZE is not supported yet, since in that case ext4_mpage_readpages() sometimes falls back to block_read_full_page(), which does not support fs-verity yet. Co-developed-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NEric Biggers <ebiggers@google.com>
-
由 Eric Biggers 提交于
Add most of fs-verity support to ext4. fs-verity is a filesystem feature that enables transparent integrity protection and authentication of read-only files. It uses a dm-verity like mechanism at the file level: a Merkle tree is used to verify any block in the file in log(filesize) time. It is implemented mainly by helper functions in fs/verity/. See Documentation/filesystems/fsverity.rst for the full documentation. This commit adds all of ext4 fs-verity support except for the actual data verification, including: - Adding a filesystem feature flag and an inode flag for fs-verity. - Implementing the fsverity_operations to support enabling verity on an inode and reading/writing the verity metadata. - Updating ->write_begin(), ->write_end(), and ->writepages() to support writing verity metadata pages. - Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl(). ext4 stores the verity metadata (Merkle tree and fsverity_descriptor) past the end of the file, starting at the first 64K boundary beyond i_size. This approach works because (a) verity files are readonly, and (b) pages fully beyond i_size aren't visible to userspace but can be read/written internally by ext4 with only some relatively small changes to ext4. This approach avoids having to depend on the EA_INODE feature and on rearchitecturing ext4's xattr support to support paging multi-gigabyte xattrs into memory, and to support encrypting xattrs. Note that the verity metadata *must* be encrypted when the file is, since it contains hashes of the plaintext data. This patch incorporates work by Theodore Ts'o and Chandan Rajendra. Reviewed-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NEric Biggers <ebiggers@google.com>
-
由 Theodore Ts'o 提交于
Originally, support for expanded timestamps had a bug in that pre-1970 times were erroneously encoded as being in the the 24th century. This was fixed in commit a4dad1ae ("ext4: Fix handling of extended tv_sec") which landed in 4.4. Starting with 4.4, pre-1970 timestamps were correctly encoded, but for backwards compatibility those incorrectly encoded timestamps were mapped back to the pre-1970 dates. Given that backwards compatibility workaround has been around for 4 years, and given that running e2fsck from e2fsprogs 1.43.2 and later will offer to fix these timestamps (which has been released for 3 years), it's past time to drop the legacy workaround from the kernel. Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 12 8月, 2019 3 次提交
-
-
由 Theodore Ts'o 提交于
For debugging reasons, it's useful to know the contents of the extent cache. Since the extent cache contains much of what is in the fiemap ioctl, use an fiemap-style interface to return this information. Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The new ioctl EXT4_IOC_GETSTATE returns some of the dynamic state of an ext4 inode for debugging purposes. Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The new ioctl EXT4_IOC_CLEAR_ES_CACHE will force an inode's extent status cache to be cleared out. This is intended for use for debugging. Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 22 6月, 2019 3 次提交
-
-
由 Theodore Ts'o 提交于
Clean up namespace pollution by the inline_data code. Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Move the calculation of the location of the dirent tail into initialize_dirent_tail(). Also prefix the function with ext4_ to fix kernel namepsace polution. Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Functions such as ext4_dirent_csum_verify() and ext4_dirent_csum_set() don't actually operate on a directory entry, but a directory block. And while they take a struct ext4_dir_entry *dirent as an argument, it had better be the first directory at the beginning of the direct block, or things will go very wrong. Rename the following functions so that things make more sense, and remove a lot of confusing casts along the way: ext4_dirent_csum_verify -> ext4_dirblock_csum_verify ext4_dirent_csum_set -> ext4_dirblock_csum_set ext4_dirent_csum -> ext4_dirblock_csum ext4_handle_dirty_dirent_node -> ext4_handle_dirty_dirblock Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 20 6月, 2019 1 次提交
-
-
由 Gabriel Krisman Bertazi 提交于
Temporarily cache a casefolded version of the file name under lookup in ext4_filename, to avoid repeatedly casefolding it. I got up to 30% speedup on lookups of large directories (>100k entries), depending on the length of the string under lookup. Signed-off-by: NGabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-