- 15 7月, 2014 1 次提交
-
-
由 Theodore Ts'o 提交于
Commit 27dd4385 ("ext4: introduce reserved space") reserves 2% of the file system space to make sure metadata allocations will always succeed. Given that, tracking the reservation of metadata blocks is no longer necessary. Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 12 5月, 2014 3 次提交
-
-
由 Stephen Hemminger 提交于
I have been running make namespacecheck to look for unneeded globals, and found these in ext4. Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Darrick J. Wong 提交于
The ext4_bg_has_super() function doesn't know about the new rules for where backup superblocks go on a sparse_super2 filesystem. Therefore, block bitmap initialization doesn't know that it shouldn't reserve space for backups in groups that are never going to contain backups. The result of this is e2fsck complaining about the block bitmap being incorrect (fortunately not in a way that results in cross-linked files), so fix the whole thing. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Namjae Jeon 提交于
When we perform a data integrity sync we tag all the dirty pages with PAGECACHE_TAG_TOWRITE at start of ext4_da_writepages. Later we check for this tag in write_cache_pages_da and creates a struct mpage_da_data containing contiguously indexed pages tagged with this tag and sync these pages with a call to mpage_da_map_and_submit. This process is done in while loop until all the PAGECACHE_TAG_TOWRITE pages are synced. We also do journal start and stop in each iteration. journal_stop could initiate journal commit which would call ext4_writepage which in turn will call ext4_bio_write_page even for delayed OR unwritten buffers. When ext4_bio_write_page is called for such buffers, even though it does not sync them but it clears the PAGECACHE_TAG_TOWRITE of the corresponding page and hence these pages are also not synced by the currently running data integrity sync. We will end up with dirty pages although sync is completed. This could cause a potential data loss when the sync call is followed by a truncate_pagecache call, which is exactly the case in collapse_range. (It will cause generic/127 failure in xfstests) To avoid this issue, we can use set_page_writeback_keepwrite instead of set_page_writeback, which doesn't clear TOWRITE tag. Cc: stable@vger.kernel.org Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 07 5月, 2014 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 22 4月, 2014 1 次提交
-
-
由 Theodore Ts'o 提交于
To avoid potential data races, use a spinlock which protects the raw (on-disk) inode. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 21 4月, 2014 2 次提交
-
-
由 Lukas Czerner 提交于
Currently in ext4 there is quite a mess when it comes to naming unwritten extents. Sometimes we call it uninitialized and sometimes we refer to it as unwritten. The right name for the extent which has been allocated but does not contain any written data is _unwritten_. Other file systems are using this name consistently, even the buffer head state refers to it as unwritten. We need to fix this confusion in ext4. This commit changes every reference to an uninitialized extent (meaning allocated but unwritten) to unwritten extent. This includes comments, function names and variable names. It even covers abbreviation of the word uninitialized (such as uninit) and some misspellings. This commit does not change any of the code paths at all. This has been confirmed by comparing md5sums of the assembly code of each object file after all the function names were stripped from it. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
Currently EXT4_MAP_UNINIT is used in dioread_nolock case to mark the cases where we're using dioread_nolock and we're writing into either unallocated, or unwritten extent, because we need to make sure that any DIO write into that inode will wait for the extent conversion. However EXT4_MAP_UNINIT is not only entirely misleading name but also unnecessary because we can check for EXT4_MAP_UNWRITTEN in the dioread_nolock case instead. This commit removes EXT4_MAP_UNINIT flag. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 11 4月, 2014 1 次提交
-
-
由 Theodore Ts'o 提交于
The function ext4_update_i_disksize() is used in only one place, in the function mpage_map_and_submit_extent(). Move its code to simplify the code paths, and also move the call to ext4_mark_inode_dirty() into the i_data_sem's critical region, to be consistent with all of the other places where we update i_disksize. That way, we also keep the raw_inode's i_disksize protected, to avoid the following race: CPU #1 CPU #2 down_write(&i_data_sem) Modify i_disk_size up_write(&i_data_sem) down_write(&i_data_sem) Modify i_disk_size Copy i_disk_size to on-disk inode up_write(&i_data_sem) Copy i_disk_size to on-disk inode Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
-
- 25 3月, 2014 2 次提交
-
-
由 Matthew Wilcox 提交于
It's only called within inode.c, so make it static, remove its prototype from ext4.h and move it above all of its callers so it doesn't need a prototype within inode.c. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Set a in-memory superblock flag to indicate whether the file system is designed to support the Hurd. Also, add a sanity check to make sure the 64-bit feature is not set for Hurd file systems, since i_file_acl_high conflicts with a Hurd-specific field. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 19 3月, 2014 2 次提交
-
-
由 T Makphaibulchoke 提交于
This patch adds new interfaces to create and destory cache, ext4_xattr_create_cache() and ext4_xattr_destroy_cache(), and remove the cache creation and destory calls from ex4_init_xattr() and ext4_exitxattr() in fs/ext4/xattr.c. fs/ext4/super.c has been changed so that when a filesystem is mounted a cache is allocated and attched to its ext4_sb_info structure. fs/mbcache.c has been changed so that only one slab allocator is allocated and used by all mbcache structures. Signed-off-by: NT. Makphaibulchoke <tmac@hp.com>
-
由 Lukas Czerner 提交于
Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same functionality as xfs ioctl XFS_IOC_ZERO_RANGE. It can be used to convert a range of file to zeros preferably without issuing data IO. Blocks should be preallocated for the regions that span holes in the file, and the entire range is preferable converted to unwritten extents This can be also used to preallocate blocks past EOF in the same way as with fallocate. Flag FALLOC_FL_KEEP_SIZE which should cause the inode size to remain the same. Also add appropriate tracepoints. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 24 2月, 2014 1 次提交
-
-
由 Namjae Jeon 提交于
This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for Ext4. The semantics of this flag are following: 1) It collapses the range lying between offset and length by removing any data blocks which are present in this range and than updates all the logical offsets of extents beyond "offset + len" to nullify the hole created by removing blocks. In short, it does not leave a hole. 2) It should be used exclusively. No other fallocate flag in combination. 3) Offset and length supplied to fallocate should be fs block size aligned in case of xfs and ext4. 4) Collaspe range does not work beyond i_size. Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com> Tested-by: NDongsu Park <dongsu.park@profitbricks.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 22 2月, 2014 1 次提交
-
-
由 Lukas Czerner 提交于
Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 17 2月, 2014 1 次提交
-
-
由 Theodore Ts'o 提交于
If the i_crtime field is not present in the inode, don't leave the field uninitialized. Fixes: ef7f3835 ("ext4: Add nanosecond timestamps") Reported-by: NVegard Nossum <vegard.nossum@oracle.com> Tested-by: NVegard Nossum <vegard.nossum@oracle.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 20 12月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
The missing casts can cause the high 64-bits of the physical blocks to be lost. Set up new macros which allows us to make sure the right thing happen, even if at some point we end up supporting larger logical block numbers. Thanks to the Emese Revfy and the PaX security team for reporting this issue. Reported-by: NPaX Team <pageexec@freemail.hu> Reported-by: Emese Revfy <re.emese@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 12 11月, 2013 1 次提交
-
-
由 Andreas Dilger 提交于
It isn't very easy to find the declarations for the functions created by EXT4_INODE_BIT_FNS() because the names are generated by macros: ext4_test_inode_flag, ext4_set_inode_flag, ext4_clear_inode_flag ext4_test_inode_state, ext4_set_inode_state, ext4_clear_inode_state Add explicit declarations for these functions so that grep and tags can find them. Signed-off-by: NAndreas Dilger <adilger@dilger.ca> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 09 11月, 2013 1 次提交
-
-
由 J. Bruce Fields 提交于
We want to do this elsewhere as well. Also catch any attempts to use it for directories (where this ordering would conflict with ancestor-first directory ordering in lock_rename). Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Dave Chinner <david@fromorbit.com> Acked-by: NJeff Layton <jlayton@redhat.com> Acked-by: N"Theodore Ts'o" <tytso@mit.edu> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 18 10月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
In the case of a storage device that suddenly disappears, or in the case of significant file system corruption, this can result in a huge flood of messages being sent to the console. This can overflow the file system containing /var/log/messages, or if a serial console is configured, this can slow down the system so much that a hardware watchdog can end up triggering forcing a system reboot. Google-Bug-Id: 7258357 Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 04 9月, 2013 1 次提交
-
-
由 Christoph Hellwig 提交于
Add support to the core direct-io code to defer AIO completions to user context using a workqueue. This replaces opencoded and less efficient code in XFS and ext4 (we save a memory allocation for each direct IO) and will be needed to properly support O_(D)SYNC for AIO. The communication between the filesystem and the direct I/O code requires a new buffer head flag, which is a bit ugly but not avoidable until the direct I/O code stops abusing the buffer_head structure for communicating with the filesystems. Currently this creates a per-superblock unbound workqueue for these completions, which is taken from an earlier patch by Jan Kara. I'm not really convinced about this use and would prefer a "normal" global workqueue with a high concurrency limit, but this needs further discussion. JK: Fixed ext4 part, dynamic allocation of the workqueue. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 29 8月, 2013 4 次提交
-
-
由 Darrick J. Wong 提交于
If we detect either a discrepancy between the inode bitmap and the inode counts or the inode bitmap fails to pass validation checks, mark the block group corrupt and refuse to allocate or deallocate inodes from the group. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Darrick J. Wong 提交于
When we notice a block-bitmap corruption (because of device failure or something else), we should mark this group as corrupt and prevent further block allocations/deallocations from it. Currently, we end up generating one error message for every block in the bitmap. This potentially could make the system unstable as noticed in some bugs. With this patch, the error will be printed only the first time and mark the entire block group as corrupted. This prevents future access allocations/deallocations from it. Also tested by corrupting the block bitmap and forcefully introducing the mb_free_blocks error: (1) create a largefile (2Gb) $ dd if=/dev/zero of=largefile oflag=direct bs=10485760 count=200 (2) umount filesystem. use dumpe2fs to see which block-bitmaps are in use by largefile and note their block numbers (3) use dd to zero-out the used block bitmaps $ dd if=/dev/zero of=/dev/hdc4 bs=4096 seek=14 count=8 oflag=direct (4) mount the FS and delete the largefile. (5) recreate the largefile. verify that the new largefile does not get any blocks from the groups marked as bad. Without the patch, we will see mb_free_blocks error for each bit in each zero'ed out bitmap at (4). With the patch, we only see the error once per blockgroup: [ 309.706803] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 15: 32768 clusters in bitmap, 0 in gd. blk grp corrupted. [ 309.720824] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 14: 32768 clusters in bitmap, 0 in gd. blk grp corrupted. [ 309.732858] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure [ 309.748321] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 13: 32768 clusters in bitmap, 0 in gd. blk grp corrupted. [ 309.760331] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure [ 309.769695] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 12: 32768 clusters in bitmap, 0 in gd. blk grp corrupted. [ 309.781721] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure [ 309.798166] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 11: 32768 clusters in bitmap, 0 in gd. blk grp corrupted. [ 309.810184] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure [ 309.819532] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 10: 32768 clusters in bitmap, 0 in gd. blk grp corrupted. Google-Bug-Id: 7258357 [darrick.wong@oracle.com] Further modifications (by Darrick) to make more obvious that this corruption bit applies to blocks only. Set the corruption flag if the block group bitmap verification fails. Original-author: Aditya Kali <adityakali@google.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Darrick J. Wong 提交于
The block_group parameter to ext4_validate_block_bitmap is both used as a ext4_group_t inside the function and the same type is passed in by all callers. We might as well use the typedef consistently instead of open-coding the 'unsigned int'. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Zheng Liu 提交于
After applied the commit (4a092d73), we have reduced the number of source files that need to #include ext4_extents.h. But we can do better. This commit defines ext4_zeroout_es() in extents.c and move EXT_MAX_BLOCKS into ext4.h in order not to include ext4_extents.h in indirect.c and ioctl.c. Meanwhile we just need to include this file in extent_status.c when ES_AGGRESSIVE_TEST is defined. Otherwise, this commit removes a duplicated declaration in trace/events/ext4.h. After applied this patch, we just need to include ext4_extents.h file in {super,migrate,move_extents,extents}.c, and it is easy for us to define a new extent disk layout. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 17 8月, 2013 5 次提交
-
-
由 Jan Kara 提交于
The following race can lead to a loss of i_disksize update from truncate thus resulting in a wrong inode size if the inode size isn't updated again before inode is reclaimed: ext4_setattr() mpage_map_and_submit_extent() EXT4_I(inode)->i_disksize = attr->ia_size; ... ... disksize = ((loff_t)mpd->first_page) << PAGE_CACHE_SHIFT /* False because i_size isn't * updated yet */ if (disksize > i_size_read(inode)) /* True, because i_disksize is * already truncated */ if (disksize > EXT4_I(inode)->i_disksize) /* Overwrite i_disksize * update from truncate */ ext4_update_i_disksize() i_size_write(inode, attr->ia_size); For other places updating i_disksize such race cannot happen because i_mutex prevents these races. Writeback is the only place where we do not hold i_mutex and we cannot grab it there because of lock ordering. We fix the race by doing both i_disksize and i_size update in truncate atomically under i_data_sem and in mpage_map_and_submit_extent() we move the check against i_size under i_data_sem as well. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
由 Jan Kara 提交于
reaim workfile.dbase test easily triggers warning in ext4_da_update_reserve_space(): EXT4-fs warning (device ram0): ext4_da_update_reserve_space:365: ino 12, allocated 1 with only 0 reserved metadata blocks (releasing 1 blocks with reserved 9 data blocks) The problem is that (one of) tests creates file and then randomly writes to it with O_SYNC. That results in writing back pages of the file in random order so we create extents for written blocks say 0, 2, 4, 6, 8 - this last allocation also allocates new block for extents. Then we writeout block 1 so we have extents 0-2, 4, 6, 8 and we release indirect extent block because extents fit in the inode again. Then we writeout block 10 and we need to allocate indirect extent block again which triggers the warning because we don't have the reservation anymore. Fix the problem by giving back freed metadata blocks resulting from extent merging into inode's reservation pool. Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Theodore Ts'o 提交于
Add a new fiemap flag which forces the all of the extents in an inode to be cached in the extent_status tree. This is critically important when using AIO to a preallocated file, since if we need to read in blocks from the extent tree, the io_submit(2) system call becomes synchronous, and the AIO is no longer "A", which is bad. In addition, for most files which have an external leaf tree block, the cost of caching the information in the extent status tree will be less than caching the entire 4k block in the buffer cache. So it is generally a win to keep the extent information cached. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
When we read in an extent tree leaf block from disk, arrange to have all of its entries cached. In nearly all cases the in-memory representation will be more compact than the on-disk representation in the buffer cache, and it allows us to get the information without having to traverse the extent tree for successive extents. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
-
由 Jan Kara 提交于
Commit 0713ed0c added jbd2_journal_file_inode() call into ext4_block_zero_page_range(). However that function gets called from truncate path and thus inode needn't have jinode attached - that happens in ext4_file_open() but the file needn't be ever open since mount. Calling jbd2_journal_file_inode() without jinode attached results in the oops. We fix the problem by attaching jinode to inode also in ext4_truncate() and ext4_punch_hole() when we are going to zero out partial blocks. Reported-by: Nmajianpeng <majianpeng@gmail.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 01 7月, 2013 3 次提交
-
-
由 Ashish Sangwan 提交于
No need to pass file pointer when we can directly pass inode pointer. Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com> Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Joe Perches 提交于
Reduce the object size ~10% could be useful for embedded systems. Add #ifdef CONFIG_PRINTK #else #endif blocks to hold formats and arguments, passing " " to functions when !CONFIG_PRINTK and still verifying format and arguments with no_printk. $ size fs/ext4/built-in.o* text data bss dec hex filename 239375 610 888 240873 3ace9 fs/ext4/built-in.o.new 264167 738 888 265793 40e41 fs/ext4/built-in.o.old $ grep -E "CONFIG_EXT4|CONFIG_PRINTK" .config # CONFIG_PRINTK is not set CONFIG_EXT4_FS=y CONFIG_EXT4_USE_FOR_EXT23=y CONFIG_EXT4_FS_POSIX_ACL=y # CONFIG_EXT4_FS_SECURITY is not set # CONFIG_EXT4_DEBUG is not set Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Zheng Liu 提交于
Now we maintain an proper in-order LRU list in ext4 to reclaim entries from extent status tree when we are under heavy memory pressure. For keeping this order, a spin lock is used to protect this list. But this lock burns a lot of CPU time. We can use the following steps to trigger it. % cd /dev/shm % dd if=/dev/zero of=ext4-img bs=1M count=2k % mkfs.ext4 ext4-img % mount -t ext4 -o loop ext4-img /mnt % cd /mnt % for ((i=0;i<160;i++)); do truncate -s 64g $i; done % for ((i=0;i<160;i++)); do cp $i /dev/null &; done % perf record -a -g % perf report This commit tries to fix this problem. Now a new member called i_touch_when is added into ext4_inode_info to record the last access time for an inode. Meanwhile we never need to keep a proper in-order LRU list. So this can avoid to burns some CPU time. When we try to reclaim some entries from extent status tree, we use list_sort() to get a proper in-order list. Then we traverse this list to discard some entries. In ext4_sb_info, we use s_es_last_sorted to record the last time of sorting this list. When we traverse the list, we skip the inode that is newer than this time, and move this inode to the tail of LRU list. When the head of the list is newer than s_es_last_sorted, we will sort the LRU list again. In this commit, we break the loop if s_extent_cache_cnt == 0 because that means that all extents in extent status tree have been reclaimed. Meanwhile in this commit, ext4_es_{un}register_shrinker()'s prototype is changed to save a local variable in these functions. Reported-by: NDave Hansen <dave.hansen@intel.com> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 29 6月, 2013 1 次提交
-
-
由 Al Viro 提交于
and trim the living hell out bogosities in inline dir case Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 06 6月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
The group number passed to ext4_get_group_info() should be valid, but let's add an assert to check this before we start creating a pointer based on that group number and dereferencing it. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 05 6月, 2013 5 次提交
-
-
由 Jan Kara 提交于
Now that we clear PageWriteback after extent conversion, there's no need to wait for io_end processing in ext4_evict_inode(). Running AIO/DIO keeps file reference until aio_complete() is called so ext4_evict_inode() cannot be called. For io_end structures resulting from buffered IO waiting is happening because we wait for PageWriteback in truncate_inode_pages(). Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
We don't have to wait for extent conversion in ext4_punch_hole() as buffered IO for the punched range has been flushed and waited upon (thus all extent conversions for that range have completed). Also we wait for all DIO to finish using inode_dio_wait() so there cannot be any extent conversions pending due to direct IO. Also remove ext4_flush_unwritten_io() since it's unused now. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Currently PageWriteback bit gets cleared from put_io_page() called from ext4_end_bio(). This is somewhat inconvenient as extent tree is not fully updated at that time (unwritten extents are not marked as written) so we cannot read the data back yet. This design was dictated by lock ordering as we cannot start a transaction while PageWriteback bit is set (we could easily deadlock with ext4_da_writepages()). But now that we use transaction reservation for extent conversion, locking issues are solved and we can move PageWriteback bit clearing after extent conversion is done. As a result we can remove wait for unwritten extent conversion from ext4_sync_file() because it already implicitely happens through wait_on_page_writeback(). We implement deferring of PageWriteback clearing by queueing completed bios to appropriate io_end and processing all the pages when io_end is going to be freed instead of at the moment ext4_io_end() is called. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Now that we have extent conversions with reserved transaction, we have to prevent extent conversions without reserved transaction (from DIO code) to block these (as that would effectively void any transaction reservation we did). So split lists, work items, and work queues to reserved and unreserved parts. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Later we would like to clear PageWriteback bit only after extent conversion from unwritten to written extents is performed. However it is not possible to start a transaction after PageWriteback is set because that violates lock ordering (and is easy to deadlock). So we have to reserve a transaction before locking pages and sending them for IO and later we use the transaction for extent conversion from ext4_end_io(). Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-