- 06 11月, 2019 2 次提交
-
-
由 Jan Kara 提交于
So far we have reserved only relatively high fixed amount of revoke credits for each transaction. We over-reserved by large amount for most cases but when freeing large directories or files with data journalling, the fixed amount is not enough. In fact the worst case estimate is inconveniently large (maximum extent size) for freeing of one extent. We fix this by doing proper estimate of the amount of blocks that need to be revoked when removing blocks from the inode due to truncate or hole punching and otherwise reserve just a small amount of revoke credits for each transaction to accommodate freeing of xattrs block or so. Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-23-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
Provide ext4_journal_ensure_credits_fn() function to ensure transaction has given amount of credits and call helper function to prepare for restarting a transaction. This allows to remove some boilerplate code from various places, add proper error handling for the case where transaction extension or restart fails, and reduces following changes needed for proper revoke record reservation tracking. Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-10-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 23 8月, 2019 1 次提交
-
-
由 Rakesh Pandit 提交于
Really enable warning when CONFIG_EXT4_DEBUG is set and fix missing first argument. This was introduced in commit ff95ec22 ("ext4: add warning to ext4_convert_unwritten_extents_endio") and splitting extents inside endio would trigger it. Fixes: ff95ec22 ("ext4: add warning to ext4_convert_unwritten_extents_endio") Signed-off-by: NRakesh Pandit <rakesh@tuxera.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
- 12 8月, 2019 1 次提交
-
-
由 Theodore Ts'o 提交于
For debugging reasons, it's useful to know the contents of the extent cache. Since the extent cache contains much of what is in the fiemap ioctl, use an fiemap-style interface to return this information. Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 20 6月, 2019 1 次提交
-
-
由 Theodore Ts'o 提交于
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 22 5月, 2019 1 次提交
-
-
由 Theodore Ts'o 提交于
Since the journal inode is already checked when we added it to the block validity's system zone, if we check it again, we'll just trigger a failure. This was causing failures like this: [ 53.897001] EXT4-fs error (device sda): ext4_find_extent:909: inode #8: comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0) [ 53.931430] jbd2_journal_bmap: journal block not found at offset 49 on sda-8 [ 53.938480] Aborting journal on device sda-8. ... but only if the system was under enough memory pressure that logical->physical mapping for the journal inode gets pushed out of the extent cache. (This is why it wasn't noticed earlier.) Fixes: 345c0dbf ("ext4: protect journal inode's blocks using block_validity") Reported-by: NDan Rue <dan.rue@linaro.org> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
-
- 11 5月, 2019 1 次提交
-
-
由 Sriram Rajagopalan 提交于
This commit zeroes out the unused memory region in the buffer_head corresponding to the extent metablock after writing the extent header and the corresponding extent node entries. This is done to prevent random uninitialized data from getting into the filesystem when the extent block is synced. This fixes CVE-2019-11833. Signed-off-by: NSriram Rajagopalan <sriramr@arista.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
- 01 3月, 2019 1 次提交
-
-
由 Eric Whitney 提交于
Ext4 may not free clusters correctly when punching holes in bigalloc file systems under high load conditions. If it's not possible to extend and restart the journal in ext4_ext_rm_leaf() when preparing to remove blocks from a punched region, a retry of the entire punch operation is triggered in ext4_ext_remove_space(). This causes a partial cluster to be set to the first cluster in the extent found to the right of the punched region. However, if the punch operation prior to the retry had made enough progress to delete one or more extents and a partial cluster candidate for freeing had already been recorded, the retry would overwrite the partial cluster. The loss of this information makes it impossible to correctly free the original partial cluster in all cases. This bug can cause generic/476 to fail when run as part of xfstests-bld's bigalloc and bigalloc_1k test cases. The failure is reported when e2fsck detects bad iblocks counts greater than expected in units of whole clusters and also detects a number of negative block bitmap differences equal to the iblocks discrepancy in cluster units. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 11 2月, 2019 1 次提交
-
-
由 zhangyi (F) 提交于
Now, we have already handle all cases of forgetting buffer in jbd2_journal_forget(), the buffer should not be mapped to blockdevice when reallocating it. So this patch remove all clean_bdev_aliases() and clean_bdev_bh_alias() calls which were invoked by ext4 explicitly. Suggested-by: NJan Kara <jack@suse.cz> Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 24 1月, 2019 1 次提交
-
-
由 Chandan Rajendra 提交于
This commit removes the ext4 specific ext4_encrypted_inode() and makes use of the generic IS_ENCRYPTED() macro to check for the encryption status of an inode. Reviewed-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: NEric Biggers <ebiggers@google.com>
-
- 02 10月, 2018 4 次提交
-
-
由 Eric Whitney 提交于
Modify ext4_ext_remove_space() and the code it calls to correct the reserved cluster count for pending reservations (delayed allocated clusters shared with allocated blocks) when a block range is removed from the extent tree. Pending reservations may be found for the clusters at the ends of written or unwritten extents when a block range is removed. If a physical cluster at the end of an extent is freed, it's necessary to increment the reserved cluster count to maintain correct accounting if the corresponding logical cluster is shared with at least one delayed and unwritten extent as found in the extents status tree. Add a new function, ext4_rereserve_cluster(), to reapply a reservation on a delayed allocated cluster sharing blocks with a freed allocated cluster. To avoid ENOSPC on reservation, a flag is applied to ext4_free_blocks() to briefly defer updating the freeclusters counter when an allocated cluster is freed. This prevents another thread from allocating the freed block before the reservation can be reapplied. Redefine the partial cluster object as a struct to carry more state information and to clarify the code using it. Adjust the conditional code structure in ext4_ext_remove_space to reduce the indentation level in the main body of the code to improve readability. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Eric Whitney 提交于
Ext4 does not always reduce the reserved cluster count by the number of clusters allocated when mapping a delayed extent. It sometimes adds back one or more clusters after allocation if delalloc blocks adjacent to the range allocated by ext4_ext_map_blocks() share the clusters newly allocated for that range. However, this overcounts the number of clusters needed to satisfy future mapping requests (holding one or more reservations for clusters that have already been allocated) and premature ENOSPC and quota failures, etc., result. Ext4 also does not reduce the reserved cluster count when allocating clusters for non-delayed allocated writes that have previously been reserved for delayed writes. This also results in overcounts. To make it possible to handle reserved cluster accounting for fallocated regions in the same manner as used for other non-delayed writes, do the reserved cluster accounting for them at the time of allocation. In the current code, this is only done later when a delayed extent sharing the fallocated region is finally mapped. Address comment correcting handling of unsigned long long constant from Jan Kara's review of RFC version of this patch. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Eric Whitney 提交于
The code in ext4_da_map_blocks sometimes reserves space for more delayed allocated clusters than it should, resulting in premature ENOSPC, exceeded quota, and inaccurate free space reporting. Fix this by checking for written and unwritten blocks shared in the same cluster with the newly delayed allocated block. A cluster reservation should not be made for a cluster for which physical space has already been allocated. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Eric Whitney 提交于
Ext4 contains a few functions that are used to search for delayed extents or blocks in the extents status tree. Rather than duplicate code to add new functions to search for extents with different status values, such as written or a combination of delayed and unwritten, generalize the existing code to search for caller-specified extents status values. Also, move this code into extents_status.c where it is better associated with the data structures it operates upon, and where it can be more readily used to implement new extents status tree functions that might want a broader scope for i_es_lock. Three missing static specifiers in RFC version of patch reported and fixed by Fengguang Wu <fengguang.wu@intel.com>. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 30 7月, 2018 1 次提交
-
-
由 Ross Zwisler 提交于
Follow the lead of xfs_break_dax_layouts() and add synchronization between operations in ext4 which remove blocks from an inode (hole punch, truncate down, etc.) and pages which are pinned due to DAX DMA operations. Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NLukas Czerner <lczerner@redhat.com>
-
- 15 6月, 2018 1 次提交
-
-
由 Theodore Ts'o 提交于
If there is a corupted file system where the claimed depth of the extent tree is -1, this can cause a massive buffer overrun leading to sadness. This addresses CVE-2018-10877. https://bugzilla.kernel.org/show_bug.cgi?id=199417Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
- 13 6月, 2018 1 次提交
-
-
由 Kees Cook 提交于
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 12 4月, 2018 1 次提交
-
-
由 Eric Biggers 提交于
During the "insert range" fallocate operation, extents starting at the range offset are shifted "right" (to a higher file offset) by the range length. But, as shown by syzbot, it's not validated that this doesn't cause extents to be shifted beyond EXT_MAX_BLOCKS. In that case ->ee_block can wrap around, corrupting the extent tree. Fix it by returning an error if the space between the end of the last extent and EXT4_MAX_BLOCKS is smaller than the range being inserted. This bug can be reproduced by running the following commands when the current directory is on an ext4 filesystem with a 4k block size: fallocate -l 8192 file fallocate --keep-size -o 0xfffffffe000 -l 4096 -n file fallocate --insert-range -l 8192 file Then after unmounting the filesystem, e2fsck reports corruption. Reported-by: syzbot+06c885be0edcdaeab40c@syzkaller.appspotmail.com Fixes: 331573fe ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 26 3月, 2018 1 次提交
-
-
由 zhenwei.pi 提交于
"mark_unwritten" in comment and "unwritten" in the function arguments is mismatched. Signed-off-by: Nzhenwei.pi <zhenwei.pi@youruncloud.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 22 3月, 2018 1 次提交
-
-
由 Nikolay Borisov 提交于
Commit 16c54688 ("ext4: Allow parallel DIO reads") reworked the way locking happens around parallel dio reads. This resulted in obviating the need for EXT4_STATE_DIOREAD_LOCK flag and accompanying logic. Currently this amounts to dead code so let's remove it. No functional changes Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 18 12月, 2017 1 次提交
-
-
由 Theodore Ts'o 提交于
A number of ext4 source files were skipped due because their copyright permission statements didn't match the expected text used by the automated conversion utilities. I've added SPDX tags for the rest. While looking at some of these files, I've noticed that we have quite a bit of variation on the licenses that were used --- in particular some of the Red Hat licenses on the jbd2 files use a GPL2+ license, and we have some files that have a LGPL-2.1 license (which was quite surprising). I've not attempted to do any license changes. Even if it is perfectly legal to relicense to GPL 2.0-only for consistency's sake, that should be done with ext4 developer community discussion. Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 04 12月, 2017 1 次提交
-
-
由 Eryu Guan 提交于
Currently, fallocate(2) with KEEP_SIZE followed by a fdatasync(2) then crash, we'll see wrong allocated block number (stat -c %b), the blocks allocated beyond EOF are all lost. fstests generic/468 exposes this bug. Commit 67a7d5f5 ("ext4: fix fdatasync(2) after extent manipulation operations") fixed all the other extent manipulation operation paths such as hole punch, zero range, collapse range etc., but forgot the fallocate case. So similarly, fix it by recording the correct journal tid in ext4 inode in fallocate(2) path, so that ext4_sync_file() will wait for the right tid to be committed on fdatasync(2). This addresses the test failure in xfstests test generic/468. Signed-off-by: NEryu Guan <eguan@redhat.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 07 10月, 2017 1 次提交
-
-
由 Theodore Ts'o 提交于
If there are pending writes subject to delayed allocation, then i_size will show size after the writes have completed, while i_disksize contains the value of i_size on the disk (since the writes have not been persisted to disk). If fallocate(2) is called with the FALLOC_FL_KEEP_SIZE flag, either with or without the FALLOC_FL_ZERO_RANGE flag set, and the new size after the fallocate(2) is between i_size and i_disksize, then after a crash, if a journal commit has resulted in the changes made by the fallocate() call to be persisted after a crash, but the delayed allocation write has not resolved itself, i_size would not be updated, and this would cause the following e2fsck complaint: Inode 12, end of extent exceeds allowed value (logical block 33, physical block 33441, len 7) This can only take place on a sparse file, where the fallocate(2) call is allocating blocks in a range which is before a pending delayed allocation write which is extending i_size. Since this situation is quite rare, and the window in which the crash must take place is typically < 30 seconds, in practice this condition will rarely happen. Nevertheless, it can be triggered in testing, and in particular by xfstests generic/456. Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reported-by: NAmir Goldstein <amir73il@gmail.com> Cc: stable@vger.kernel.org
-
- 06 8月, 2017 2 次提交
-
-
由 Maninder Singh 提交于
This bug was found by a static code checker tool for copy paste problems. Signed-off-by: NManinder Singh <maninder1.s@samsung.com> Signed-off-by: NVaneet Narang <v.narang@samsung.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Tahsin Erdogan 提交于
ext4_alloc_file_blocks() does not use its mode parameter. Remove it. Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 22 6月, 2017 1 次提交
-
-
由 Tahsin Erdogan 提交于
ea_inode contents are treated as metadata, that's why it is journaled during initial writes. Failing to call revoke during freeing could cause user data to be overwritten with original ea_inode contents during journal replay. Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 30 5月, 2017 1 次提交
-
-
由 Jan Kara 提交于
Currently, extent manipulation operations such as hole punch, range zeroing, or extent shifting do not record the fact that file data has changed and thus fdatasync(2) has a work to do. As a result if we crash e.g. after a punch hole and fdatasync, user can still possibly see the punched out data after journal replay. Test generic/392 fails due to these problems. Fix the problem by properly marking that file data has changed in these operations. CC: stable@vger.kernel.org Fixes: a4bb6b64Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 27 5月, 2017 1 次提交
-
-
由 Jan Kara 提交于
When ext4_map_blocks() is called with EXT4_GET_BLOCKS_ZERO to zero-out allocated blocks and these blocks are actually converted from unwritten extent the following race can happen: CPU0 CPU1 page fault page fault ... ... ext4_map_blocks() ext4_ext_map_blocks() ext4_ext_handle_unwritten_extents() ext4_ext_convert_to_initialized() - zero out converted extent ext4_zeroout_es() - inserts extent as initialized in status tree ext4_map_blocks() ext4_es_lookup_extent() - finds initialized extent write data ext4_issue_zeroout() - zeroes out new extent overwriting data This problem can be reproduced by generic/340 for the fallocated case for the last block in the file. Fix the problem by avoiding zeroing out the area we are mapping with ext4_map_blocks() in ext4_ext_convert_to_initialized(). It is pointless to zero out this area in the first place as the caller asked us to convert the area to initialized because he is just going to write data there before the transaction finishes. To achieve this we delete the special case of zeroing out full extent as that will be handled by the cases below zeroing only the part of the extent that needs it. We also instruct ext4_split_extent() that the middle of extent being split contains data so that ext4_split_extent_at() cannot zero out full extent in case of ENOSPC. CC: stable@vger.kernel.org Fixes: 12735f88Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 09 1月, 2017 2 次提交
-
-
由 Roman Pen 提交于
Inside ext4_ext_shift_extents() function ext4_find_extent() is called without EXT4_EX_NOCACHE flag, which should prevent cache population. This leads to oudated offsets in the extents tree and wrong blocks afterwards. Patch fixes the problem providing EXT4_EX_NOCACHE flag for each ext4_find_extents() call inside ext4_ext_shift_extents function. Fixes: 331573feSigned-off-by: NRoman Pen <roman.penyaev@profitbricks.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: stable@vger.kernel.org
-
由 Roman Pen 提交于
While doing 'insert range' start block should be also shifted right. The bug can be easily reproduced by the following test: ptr = malloc(4096); assert(ptr); fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600); assert(fd >= 0); rc = fallocate(fd, 0, 0, 8192); assert(rc == 0); for (i = 0; i < 2048; i++) *((unsigned short *)ptr + i) = 0xbeef; rc = pwrite(fd, ptr, 4096, 0); assert(rc == 4096); rc = pwrite(fd, ptr, 4096, 4096); assert(rc == 4096); for (block = 2; block < 1000; block++) { rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096); assert(rc == 0); for (i = 0; i < 2048; i++) *((unsigned short *)ptr + i) = block; rc = pwrite(fd, ptr, 4096, 4096); assert(rc == 4096); } Because start block is not included in the range the hole appears at the wrong offset (just after the desired offset) and the following pwrite() overwrites already existent block, keeping hole untouched. Simple way to verify wrong behaviour is to check zeroed blocks after the test: $ hexdump ./ext4.file | grep '0000 0000' The root cause of the bug is a wrong range (start, stop], where start should be inclusive, i.e. [start, stop]. This patch fixes the problem by including start into the range. But not to break left shift (range collapse) stop points to the beginning of the a block, not to the end. The other not obvious change is an iterator check on validness in a main loop. Because iterator is unsigned the following corner case should be considered with care: insert a block at 0 offset, when stop variables overflows and never becomes less than start, which is 0. To handle this special case iterator is set to NULL to indicate that end of the loop is reached. Fixes: 331573feSigned-off-by: NRoman Pen <roman.penyaev@profitbricks.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: stable@vger.kernel.org
-
- 25 12月, 2016 1 次提交
-
-
由 Linus Torvalds 提交于
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 12月, 2016 1 次提交
-
-
由 Dan Carpenter 提交于
Before commit c3fe493c ('ext4: remove unneeded test in ext4_alloc_file_blocks()') then it was possible for "depth" to be -1 but now, it's not possible that it is negative. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 15 11月, 2016 1 次提交
-
-
由 Deepa Dinamani 提交于
CURRENT_TIME_SEC and CURRENT_TIME are not y2038 safe. current_time() will be transitioned to be y2038 safe along with vfs. current_time() returns timestamps according to the granularities set in the super_block. The granularity check in ext4_current_time() to call current_time() or CURRENT_TIME_SEC is not required. Use current_time() directly to obtain timestamps unconditionally, and remove ext4_current_time(). Quota files are assumed to be on the same filesystem. Hence, use current_time() for these files as well. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NArnd Bergmann <arnd@arndb.de>
-
- 14 11月, 2016 1 次提交
-
-
由 Theodore Ts'o 提交于
Return errors to the caller instead of declaring the file system corrupted. Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 05 11月, 2016 1 次提交
-
-
由 Jan Kara 提交于
Use clean_bdev_aliases() instead of iterating through blocks one by one. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 15 9月, 2016 3 次提交
-
-
由 Fabian Frederick 提交于
Create a macro to calculate length + offset -> maximum blocks This adds more readability. Signed-off-by: NFabian Frederick <fabf@skynet.be> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Fabian Frederick 提交于
ext4_alloc_file_blocks() is called from ext4_zero_range() and ext4_fallocate() both already testing EXT4_INODE_EXTENTS We can call ext_depth(inode) unconditionnally. [ Added BUG_ON check to make sure ext4_alloc_file_blocks() won't get called for a indirect-mapped inode in the future. -- tytso ] Signed-off-by: NFabian Frederick <fabf@skynet.be> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Fabian Frederick 提交于
Running xfstests generic/013 with kmemleak gives the following: unreferenced object 0xffff8801d3d27de0 (size 96): comm "fsstress", pid 4941, jiffies 4294860168 (age 53.485s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff818eaaf3>] kmemleak_alloc+0x23/0x40 [<ffffffff81179805>] __kmalloc+0xf5/0x1d0 [<ffffffff8122ef5c>] ext4_find_extent+0x1ec/0x2f0 [<ffffffff8123530c>] ext4_insert_range+0x34c/0x4a0 [<ffffffff81235942>] ext4_fallocate+0x4e2/0x8b0 [<ffffffff81181334>] vfs_fallocate+0x134/0x210 [<ffffffff8118203f>] SyS_fallocate+0x3f/0x60 [<ffffffff818efa9b>] entry_SYSCALL_64_fastpath+0x13/0x8f [<ffffffffffffffff>] 0xffffffffffffffff Problem seems mitigated by dropping refs and freeing path when there's no path[depth].p_ext Cc: stable@vger.kernel.org Signed-off-by: NFabian Frederick <fabf@skynet.be> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 15 7月, 2016 1 次提交
-
-
由 Vegard Nossum 提交于
Although the extent tree depth of 5 should enough be for the worst case of 2*32 extents of length 1, the extent tree code does not currently to merge nodes which are less than half-full with a sibling node, or to shrink the tree depth if possible. So it's possible, at least in theory, for the tree depth to be greater than 5. However, even in the worst case, a tree depth of 32 is highly unlikely, and if the file system is maliciously corrupted, an insanely large eh_depth can cause memory allocation failures that will trigger kernel warnings (here, eh_depth = 65280): JBD2: ext4.exe wants too many credits credits:195849 rsv_credits:0 max:256 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 50 at fs/jbd2/transaction.c:293 start_this_handle+0x569/0x580 CPU: 0 PID: 50 Comm: ext4.exe Not tainted 4.7.0-rc5+ #508 Stack: 604a8947 625badd8 0002fd09 00000000 60078643 00000000 62623910 601bf9bc 62623970 6002fc84 626239b0 900000125 Call Trace: [<6001c2dc>] show_stack+0xdc/0x1a0 [<601bf9bc>] dump_stack+0x2a/0x2e [<6002fc84>] __warn+0x114/0x140 [<6002fdff>] warn_slowpath_null+0x1f/0x30 [<60165829>] start_this_handle+0x569/0x580 [<60165d4e>] jbd2__journal_start+0x11e/0x220 [<60146690>] __ext4_journal_start_sb+0x60/0xa0 [<60120a81>] ext4_truncate+0x131/0x3a0 [<60123677>] ext4_setattr+0x757/0x840 [<600d5d0f>] notify_change+0x16f/0x2a0 [<600b2b16>] do_truncate+0x76/0xc0 [<600c3e56>] path_openat+0x806/0x1300 [<600c55c9>] do_filp_open+0x89/0xf0 [<600b4074>] do_sys_open+0x134/0x1e0 [<600b4140>] SyS_open+0x20/0x30 [<6001ea68>] handle_syscall+0x88/0x90 [<600295fd>] userspace+0x3fd/0x500 [<6001ac55>] fork_handler+0x85/0x90 ---[ end trace 08b0b88b6387a244 ]--- [ Commit message modified and the extent tree depath check changed from 5 to 32 -- tytso ] Cc: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 30 6月, 2016 1 次提交
-
-
由 Vegard Nossum 提交于
An extent with lblock = 4294967295 and len = 1 will pass the ext4_valid_extent() test: ext4_lblk_t last = lblock + len - 1; if (len == 0 || lblock > last) return 0; since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end(). We can simplify it by removing the - 1 altogether and changing the test to use lblock + len <= lblock, since now if len = 0, then lblock + 0 == lblock and it fails, and if len > 0 then lblock + len > lblock in order to pass (i.e. it doesn't overflow). Fixes: 5946d089 ("ext4: check for overlapping extents in ext4_valid_extent_entries()") Fixes: 2f974865 ("ext4: check for zero length extent explicitly") Cc: Eryu Guan <guaneryu@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: NPhil Turnbull <phil.turnbull@oracle.com> Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-