- 17 2月, 2015 1 次提交
-
-
由 Ross Zwisler 提交于
This is a port of the DAX functionality found in the current version of ext2. [matthew.r.wilcox@intel.com: heavily tweaked] [akpm@linux-foundation.org: remap_pages went away] Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: NAndreas Dilger <andreas.dilger@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 11月, 2014 5 次提交
-
-
由 Wang Shilong 提交于
ext4_delete_inode() has been renamed for a long time, update comments for this. Signed-off-by: NWang Shilong <wshilong@ddn.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
Currently callers adding extents to extent status tree were responsible for adding the inode to the list of inodes with freeable extents. This is error prone and puts list handling in unnecessarily many places. Just add inode to the list automatically when the first non-delay extent is added to the tree and remove inode from the list when the last non-delay extent is removed. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Zheng Liu 提交于
In this commit we discard the lru algorithm for inodes with extent status tree because it takes significant effort to maintain a lru list in extent status tree shrinker and the shrinker can take a long time to scan this lru list in order to reclaim some objects. We replace the lru ordering with a simple round-robin. After that we never need to keep a lru list. That means that the list needn't be sorted if the shrinker can not reclaim any objects in the first round. Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Zheng Liu 提交于
Currently extent status tree doesn't cache extent hole when a write looks up in extent tree to make sure whether a block has been allocated or not. In this case, we don't put extent hole in extent cache because later this extent might be removed and a new delayed extent might be added back. But it will cause a defect when we do a lot of writes. If we don't put extent hole in extent cache, the following writes also need to access extent tree to look at whether or not a block has been allocated. It brings a cache miss. This commit fixes this defect. Also if the inode doesn't have any extent, this extent hole will be cached as well. Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
For bigalloc filesystems we have to check whether newly requested inode block isn't already part of a cluster for which we already have delayed allocation reservation. This check happens in ext4_ext_map_blocks() and that function sets EXT4_MAP_FROM_CLUSTER if that's the case. However if ext4_da_map_blocks() finds in extent cache information about the block, we don't call into ext4_ext_map_blocks() and thus we always end up getting new reservation even if the space for cluster is already reserved. This results in overreservation and premature ENOSPC reports. Fix the problem by checking for existing cluster reservation already in ext4_da_map_blocks(). That simplifies the logic and actually allows us to get rid of the EXT4_MAP_FROM_CLUSTER flag completely. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 30 10月, 2014 1 次提交
-
-
由 Jan Kara 提交于
When clearing inode journal flag, we call jbd2_journal_flush() to force all the journalled data to their final locations. Currently we ignore when this fails and continue clearing inode journal flag. This isn't a big problem because when jbd2_journal_flush() fails, journal is likely aborted anyway. But it can still lead to somewhat confusing results so rather bail out early. Coverity-id: 989044 Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 13 10月, 2014 1 次提交
-
-
由 Dmitry Monakhov 提交于
Besides the fact that this replacement improves code readability it also protects from errors caused direct EXT4_S(sb)->s_es manipulation which may result attempt to use uninitialized csum machinery. #Testcase_BEGIN IMG=/dev/ram0 MNT=/mnt mkfs.ext4 $IMG mount $IMG $MNT #Enable feature directly on disk, on mounted fs tune2fs -O metadata_csum $IMG # Provoke metadata update, likey result in OOPS touch $MNT/test umount $MNT #Testcase_END # Replacement script @@ expression E; @@ - EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) + ext4_has_metadata_csum(E) https://bugzilla.kernel.org/show_bug.cgi?id=82201Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 12 10月, 2014 1 次提交
-
-
由 Eric Sandeen 提交于
Delalloc write journal reservations only reserve 1 credit, to update the inode if necessary. However, it may happen once in a filesystem's lifetime that a file will cross the 2G threshold, and require the LARGE_FILE feature to be set in the superblock as well, if it was not set already. This overruns the transaction reservation, and can be demonstrated simply on any ext4 filesystem without the LARGE_FILE feature already set: dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \ conv=notrunc of=testfile sync dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \ conv=notrunc of=testfile leads to: EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28 EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28 EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28 Adjust the number of credits based on whether the flag is already set, and whether the current write may extend past the LARGE_FILE limit. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Cc: stable@vger.kernel.org
-
- 06 10月, 2014 2 次提交
-
-
由 Theodore Ts'o 提交于
If there is a corrupted file system which has directory entries that point at reserved, metadata inodes, prohibit them from being used by treating them the same way we treat Boot Loader inodes --- that is, mark them to be bad inodes. This prohibits them from being opened, deleted, or modified via chmod, chown, utimes, etc. In particular, this prevents a corrupted file system which has a directory entry which points at the journal inode from being deleted and its blocks released, after which point Much Hilarity Ensues. Reported-by: NSami Liedes <sami.liedes@iki.fi> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
由 Theodore Ts'o 提交于
The boot loader inode (inode #5) should never be visible in the directory hierarchy, but it's possible if the file system is corrupted that there will be a directory entry that points at inode #5. In order to avoid accidentally trashing it, when such a directory inode is opened, the inode will be marked as a bad inode, so that it's not possible to modify (or read) the inode from userspace. Unfortunately, when we unlink this (invalid/illegal) directory entry, we will put the bad inode on the ophan list, and then when try to unlink the directory, we don't actually remove the bad inode from the orphan list before freeing in-memory inode structure. This means the in-memory orphan list is corrupted, leading to a kernel oops. In addition, avoid truncating a bad inode in ext4_destroy_inode(), since truncating the boot loader inode is not a smart thing to do. Reported-by: NSami Liedes <sami.liedes@iki.fi> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 02 10月, 2014 2 次提交
-
-
由 Li Xi 提交于
When ext4_do_update_inode() gets error from ext4_inode_blocks_set(), error number should be returned. Signed-off-by: NLi Xi <lixi@ddn.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
Use truncate_isize_extended() when hole is being created in a file so that ->page_mkwrite() will get called for the partial tail page if it is mmaped (see the first patch in the series for details). Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 05 9月, 2014 1 次提交
-
-
由 Theodore Ts'o 提交于
Having done a full regression test, we can now drop the DELALLOC_RESERVED state flag. Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 02 9月, 2014 1 次提交
-
-
由 Seunghun Lee 提交于
get_blocks is renamed to get_block. Signed-off-by: NSeunghun Lee <waydi1@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 31 8月, 2014 1 次提交
-
-
由 Dmitry Monakhov 提交于
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 30 8月, 2014 2 次提交
-
-
由 Theodore Ts'o 提交于
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 29 8月, 2014 1 次提交
-
-
由 Dmitry Monakhov 提交于
In case of delalloc block i_disksize may be less than i_size. So we have to update i_disksize each time we allocated and submitted some blocks beyond i_disksize. We weren't doing this on the error paths, so fix this. testcase: xfstest generic/019 Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 24 8月, 2014 1 次提交
-
-
由 Dmitry Monakhov 提交于
Cc: stable@vger.kernel.org # needed for bug fix patches Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 15 7月, 2014 2 次提交
-
-
由 Lukas Czerner 提交于
Currently punch hole code on files with direct/indirect mapping has some problems which may lead to a data loss. For example (from Jan Kara): fallocate -n -p 10240000 4096 will punch the range 10240000 - 12632064 instead of the range 1024000 - 10244096. Also the code is a bit weird and it's not using infrastructure provided by indirect.c, but rather creating it's own way. This patch fixes the issues as well as making the operation to run 4 times faster from my testing (punching out 60GB file). It uses similar approach used in ext4_ind_truncate() which takes advantage of ext4_free_branches() function. Also rename the ext4_free_hole_blocks() to something more sensible, like the equivalent we have for extent mapped files. Call it ext4_ind_remove_space(). This has been tested mostly with fsx and some xfstests which are testing punch hole but does not require unwritten extents which are not supported with direct/indirect mapping. Not problems showed up even with 1024k block size. CC: stable@vger.kernel.org Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Commit 27dd4385 ("ext4: introduce reserved space") reserves 2% of the file system space to make sure metadata allocations will always succeed. Given that, tracking the reservation of metadata blocks is no longer necessary. Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 02 6月, 2014 1 次提交
-
-
由 Zheng Liu 提交于
This commit tries to fix a bug that we can't read symlink properly with inline data feature when the length of symlink is greater than 60 bytes but less than extra space. The key issue is in ext4_inode_is_fast_symlink() that it doesn't check whether or not an inode has inline data. When the user creates a new symlink, an inode will be allocated with MAY_INLINE_DATA flag. Then symlink will be stored in ->i_block and extended attribute space. In the mean time, this inode is with inline data flag. After remounting it, ext4_inode_is_fast_symlink() function thinks that this inode is a fast symlink so that the data in ->i_block is copied to the user, and the data in extra space is trimmed. In fact this inode should be as a normal symlink. The following script can hit this bug. #!/bin/bash cd ${MNT} filename=ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 rm -rf test mkdir test cd test echo "hello" >$filename ln -s $filename symlinkfile cd sudo umount /mnt/sda1 sudo mount -t ext4 /dev/sda1 /mnt/sda1 readlink /mnt/sda1/test/symlinkfile After applying this patch, it will break the assumption in e2fsck because the original implementation doesn't want to support symlink with inline data. Reported-by: N"Darrick J. Wong" <darrick.wong@oracle.com> Reported-by: NIan Nartowicz <claws@nartowicz.co.uk> Cc: Ian Nartowicz <claws@nartowicz.co.uk> Cc: Tao Ma <tm@tao.ma> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 13 5月, 2014 2 次提交
-
-
由 liang xie 提交于
Make them more consistently Signed-off-by: Nxieliang <xieliang@xiaomi.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 12 5月, 2014 2 次提交
-
-
由 Stephen Hemminger 提交于
I have been running make namespacecheck to look for unneeded globals, and found these in ext4. Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Namjae Jeon 提交于
When we perform a data integrity sync we tag all the dirty pages with PAGECACHE_TAG_TOWRITE at start of ext4_da_writepages. Later we check for this tag in write_cache_pages_da and creates a struct mpage_da_data containing contiguously indexed pages tagged with this tag and sync these pages with a call to mpage_da_map_and_submit. This process is done in while loop until all the PAGECACHE_TAG_TOWRITE pages are synced. We also do journal start and stop in each iteration. journal_stop could initiate journal commit which would call ext4_writepage which in turn will call ext4_bio_write_page even for delayed OR unwritten buffers. When ext4_bio_write_page is called for such buffers, even though it does not sync them but it clears the PAGECACHE_TAG_TOWRITE of the corresponding page and hence these pages are also not synced by the currently running data integrity sync. We will end up with dirty pages although sync is completed. This could cause a potential data loss when the sync call is followed by a truncate_pagecache call, which is exactly the case in collapse_range. (It will cause generic/127 failure in xfstests) To avoid this issue, we can use set_page_writeback_keepwrite instead of set_page_writeback, which doesn't clear TOWRITE tag. Cc: stable@vger.kernel.org Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 07 5月, 2014 4 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
all callers have iov_length(iter->iov, iter->nr_segs) == iov_iter_count(iter) Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
unmodified, for now Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 22 4月, 2014 1 次提交
-
-
由 Theodore Ts'o 提交于
To avoid potential data races, use a spinlock which protects the raw (on-disk) inode. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 21 4月, 2014 2 次提交
-
-
由 Lukas Czerner 提交于
Currently in ext4 there is quite a mess when it comes to naming unwritten extents. Sometimes we call it uninitialized and sometimes we refer to it as unwritten. The right name for the extent which has been allocated but does not contain any written data is _unwritten_. Other file systems are using this name consistently, even the buffer head state refers to it as unwritten. We need to fix this confusion in ext4. This commit changes every reference to an uninitialized extent (meaning allocated but unwritten) to unwritten extent. This includes comments, function names and variable names. It even covers abbreviation of the word uninitialized (such as uninit) and some misspellings. This commit does not change any of the code paths at all. This has been confirmed by comparing md5sums of the assembly code of each object file after all the function names were stripped from it. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
Currently EXT4_MAP_UNINIT is used in dioread_nolock case to mark the cases where we're using dioread_nolock and we're writing into either unallocated, or unwritten extent, because we need to make sure that any DIO write into that inode will wait for the extent conversion. However EXT4_MAP_UNINIT is not only entirely misleading name but also unnecessary because we can check for EXT4_MAP_UNWRITTEN in the dioread_nolock case instead. This commit removes EXT4_MAP_UNINIT flag. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 18 4月, 2014 1 次提交
-
-
由 Lukas Czerner 提交于
Currently in ext4_collapse_range() and ext4_punch_hole() we're discarding preallocation twice. Once before we attempt to do any changes and second time after we're done with the changes. While the second call to ext4_discard_preallocations() in ext4_punch_hole() case is not needed, we need to discard preallocation right after ext4_ext_remove_space() in collapse range case because in the case we had to restart a transaction in the middle of removing space we might have new preallocations created. Remove unneeded ext4_discard_preallocations() ext4_punch_hole() and move it to the better place in ext4_collapse_range() Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 12 4月, 2014 2 次提交
-
-
由 Lukas Czerner 提交于
Currently some file system have IS_SWAPFILE check in their fallocate implementations and some do not. However we should really prevent any fallocate operation on swapfile so move the check to vfs and remove the redundant checks from the file systems fallocate implementations. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
All the checks IS_APPEND and IS_IMMUTABLE for the fallocate operation on the inode are done in vfs. No need to do this again in ext4. Remove it. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 11 4月, 2014 1 次提交
-
-
由 Theodore Ts'o 提交于
The function ext4_update_i_disksize() is used in only one place, in the function mpage_map_and_submit_extent(). Move its code to simplify the code paths, and also move the call to ext4_mark_inode_dirty() into the i_data_sem's critical region, to be consistent with all of the other places where we update i_disksize. That way, we also keep the raw_inode's i_disksize protected, to avoid the following race: CPU #1 CPU #2 down_write(&i_data_sem) Modify i_disk_size up_write(&i_data_sem) down_write(&i_data_sem) Modify i_disk_size Copy i_disk_size to on-disk inode up_write(&i_data_sem) Copy i_disk_size to on-disk inode Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
-
- 08 4月, 2014 1 次提交
-
-
由 Theodore Ts'o 提交于
The special handling of PF_MEMALLOC callers in ext4_write_inode() shouldn't be necessary as there shouldn't be any. Warn about it. Also update comment before the function as it seems somewhat outdated. (Changes modeled on an ext3 patch posted by Jan Kara to the linux-ext4 mailing list on Februaryt 28, 2014, which apparently never went into the ext3 tree.) Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz>
-
- 07 4月, 2014 1 次提交
-
-
由 Kazuya Mio 提交于
When we try to get 2^32-1 block of the file which has the extent (ee_block=2^32-2, ee_len=1) with FIBMAP ioctl, it causes BUG_ON in ext4_ext_put_gap_in_cache(). To avoid the problem, ext4_map_blocks() needs to check the file logical block number. ext4_ext_put_gap_in_cache() called via ext4_map_blocks() cannot handle 2^32-1 because the maximum file logical block number is 2^32-2. Note that ext4_ind_map_blocks() returns -EIO when the block number is invalid. So ext4_map_blocks() should also return the same errno. Signed-off-by: NKazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-