- 04 4月, 2013 4 次提交
-
-
由 Theodore Ts'o 提交于
In order to make it simpler to test the code which support i_blocks/indirect-mapped inodes, support the conversion of inodes which are less than 12 blocks and which are contained in no more than a single extent. The primary intended use of this code is to converting freshly created zero-length files and empty directories. Note that the version of chattr in e2fsprogs 1.42.7 and earlier has a check that prevents the clearing of the extent flag. A simple patch which allows "chattr -e <file>" to work will be checked into the e2fsprogs git repository. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Move common code in ext4_ind_truncate() and ext4_ext_truncate() into ext4_truncate(). This saves over 60 lines of code. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Move common code in ext4_ind_punch_hole() and ext4_ext_punch_hole() into ext4_punch_hole(). This saves over 150 lines of code. This also fixes a potential bug when the punch_hole() code is racing against indirect-to-extents or extents-to-indirect migation. We are currently using i_mutex to protect against changes to the inode flag; specifically, the append-only, immutable, and extents inode flags. So we need to take i_mutex before deciding whether to use the extents-specific or indirect-specific punch_hole code. Also, there was a missing call to ext4_inode_block_unlocked_dio() in the indirect punch codepath. This was added in commit 02d262df to block DIO readers racing against the punch operation in the codepath for extent-mapped inodes, but it was missing for indirect-block mapped inodes. One of the advantages of refactoring the code is that it makes such oversights much less likely. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The only difference between how we handle data=ordered and data=writeback is a single call to ext4_jbd2_file_inode(). Eliminate code duplication by factoring out redundant the code paths. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NLukas Czerner <lczerner@redhat.com>
-
- 20 3月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
Commit 84c17543 (ext4: move work from io_end to inode) triggered a regression when running xfstest #270 when the file system is mounted with dioread_nolock. The problem is that after ext4_evict_inode() calls ext4_ioend_wait(), this guarantees that last io_end structure has been freed, but it does not guarantee that the workqueue structure, which was moved into the inode by commit 84c17543, is actually finished. Once ext4_flush_completed_IO() calls ext4_free_io_end() on CPU #1, this will allow ext4_ioend_wait() to return on CPU #2, at which point the evict_inode() codepath can race against the workqueue code on CPU #1 accessing EXT4_I(inode)->i_unwritten_work to find the next item of work to do. Fix this by calling cancel_work_sync() in ext4_ioend_wait(), which will be renamed ext4_ioend_shutdown(), since it is only used by ext4_evict_inode(). Also, move the call to ext4_ioend_shutdown() until after truncate_inode_pages() and filemap_write_and_wait() are called, to make sure all dirty pages have been written back and flushed from the page cache first. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e *pdpt = 0000000030bc3001 *pde = 0000000000000000 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: Pid: 6, comm: kworker/u:0 Not tainted 3.8.0-rc3-00013-g84c17543-dirty #91 Bochs Bochs EIP: 0060:[<c01dda6a>] EFLAGS: 00010046 CPU: 0 EIP is at cwq_activate_delayed_work+0x3b/0x7e EAX: 00000000 EBX: 00000000 ECX: f505fe54 EDX: 00000000 ESI: ed5b697c EDI: 00000006 EBP: f64b7e8c ESP: f64b7e84 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: 00000000 CR3: 30bc2000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process kworker/u:0 (pid: 6, ti=f64b6000 task=f64b4160 task.ti=f64b6000) Stack: f505fe00 00000006 f64b7e9c c01de3d7 f6435540 00000003 f64b7efc c01def1d f6435540 00000002 00000000 0000008a c16d0808 c040a10b c16d07d8 c16d08b0 f505fe00 c16d0780 00000000 00000000 ee153df4 c1ce4a30 c17d0e30 00000000 Call Trace: [<c01de3d7>] cwq_dec_nr_in_flight+0x71/0xfb [<c01def1d>] process_one_work+0x5d8/0x637 [<c040a10b>] ? ext4_end_bio+0x300/0x300 [<c01e3105>] worker_thread+0x249/0x3ef [<c01ea317>] kthread+0xd8/0xeb [<c01e2ebc>] ? manage_workers+0x4bb/0x4bb [<c023a370>] ? trace_hardirqs_on+0x27/0x37 [<c0f1b4b7>] ret_from_kernel_thread+0x1b/0x28 [<c01ea23f>] ? __init_kthread_worker+0x71/0x71 Code: 01 83 15 ac ff 6c c1 00 31 db 89 c6 8b 00 a8 04 74 12 89 c3 30 db 83 05 b0 ff 6c c1 01 83 15 b4 ff 6c c1 00 89 f0 e8 42 ff ff ff <8b> 13 89 f0 83 05 b8 ff 6c c1 6c c1 00 31 c9 83 EIP: [<c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e SS:ESP 0068:f64b7e84 CR2: 0000000000000000 ---[ end trace a1923229da53d8a4 ]--- Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz>
-
- 12 3月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
A user who was using a 8TB+ file system and with a very large flexbg size (> 65536) could cause the atomic_t used in the struct flex_groups to overflow. This was detected by PaX security patchset: http://forums.grsecurity.net/viewtopic.php?f=3&t=3289&p=12551#p12551 This bug was introduced in commit 9f24e420, so it's been around since 2.6.30. :-( Fix this by using an atomic64_t for struct orlav_stats's free_clusters. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NLukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org
-
- 02 3月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
Use a percpu counter rather than atomic types for shrinker accounting. There's no need for ultimate accuracy in the shrinker, so this should come a little more cheaply. The percpu struct is somewhat large, but there was a big gap before the cache-aligned s_es_lru_lock anyway, and it fits nicely in there. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 01 3月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
When the system is under memory pressure, ext4_es_srhink() will get called very often. So optimize returning the number of items in the file system's extent status cache by keeping a per-filesystem count, instead of calculating it each time by scanning all of the inodes in the extent status cache. Also rename the slab used for the extent status cache to be "ext4_extent_status" so it's obviousl the slab in question is created by ext4. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <gnehzuil.liu@gmail.com>
-
- 18 2月, 2013 4 次提交
-
-
由 Zheng Liu 提交于
Although extent status is loaded on-demand, we also need to reclaim extent from the tree when we are under a heavy memory pressure because in some cases fragmented extent tree causes status tree costs too much memory. Here we maintain a lru list in super_block. When the extent status of an inode is accessed and changed, this inode will be move to the tail of the list. The inode will be dropped from this list when it is cleared. In the inode, a counter is added to count the number of cached objects in extent status tree. Here only written/unwritten/hole extent is counted because delayed extent doesn't be reclaimed due to fiemap, bigalloc and seek_data/hole need it. The counter will be increased as a new extent is allocated, and it will be decreased as a extent is freed. In this commit we use normal shrinker framework to reclaim memory from the status tree. ext4_es_reclaim_extents_count() traverses the lru list to count the number of reclaimable extents. ext4_es_shrink() tries to reclaim written/unwritten/hole extents from extent status tree. The inode that has been shrunk is moved to the tail of lru list. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
-
由 Zheng Liu 提交于
Single extent cache could be removed because we have extent status tree as a extent cache, and it would be better. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
-
由 Zheng Liu 提交于
After tracking all extent status, we already have a extent cache in memory. Every time we want to lookup a block mapping, we can first try to lookup it in extent status tree to avoid a potential disk I/O. A new function called ext4_es_lookup_extent is defined to finish this work. When we try to lookup a block mapping, we always call ext4_map_blocks and/or ext4_da_map_blocks. So in these functions we first try to lookup a block mapping in extent status tree. A new flag EXT4_GET_BLOCKS_NO_PUT_HOLE is used in ext4_da_map_blocks in order not to put a hole into extent status tree because this hole will be converted to delayed extent in the tree immediately. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
-
由 Zheng Liu 提交于
By recording the phycisal block and status, extent status tree is able to track the status of every extents. When we call _map_blocks functions to lookup an extent or create a new written/unwritten/delayed extent, this extent will be inserted into extent status tree. We don't load all extents from disk in alloc_inode() because it costs too much memory, and if a file is opened and closed frequently it will takes too much time to load all extent information. So currently when we create/lookup an extent, this extent will be inserted into extent status tree. Hence, the extent status tree may not comprehensively contain all of the extents found in the file. Here a condition we need to take care is that an extent might contains unwritten and delayed status simultaneously because an extent is delayed allocated and could be allocated by fallocate. At this time we need to keep delayed status because later we need to update delayed reservation space using it. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
-
- 10 2月, 2013 2 次提交
-
-
由 Theodore Ts'o 提交于
In ext4_{create,mknod,mkdir,symlink}(), don't start the journal handle until the inode has been succesfully allocated. In order to do this, we need to start the handle in the ext4_new_inode(). So create a new variant of this function, ext4_new_inode_start_handle(), so the handle can be created at the last possible minute, before we need to modify the inode allocation bitmap block. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Operations which modify extended attributes may need extra journal credits if inline data is used, since there is a chance that some extended attributes may need to get pushed to an external attribute block. Changes to reflect this was made in xattr.c, but they were missed in fs/ext4/acl.c. To fix this, abstract the calculation of the number of credits needed for xattr operations to an inline function defined in ext4_jbd2.h, and use it in acl.c and xattr.c. Also move the function declarations used in inline.c from xattr.h (where they are non-obviously hidden, and caused problems since ext4_jbd2.h needs to use the function ext4_has_inline_data), and move them to ext4.h. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NTao Ma <boyu.mt@taobao.com> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 09 2月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
Move the jbd2 wrapper functions which start and stop handles out of super.c, where they don't really logically belong, and into ext4_jbd2.c. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 28 1月, 2013 3 次提交
-
-
由 Jan Kara 提交于
It does not make much sense to have struct work in ext4_io_end_t because we always use it for only one ext4_io_end_t per inode (the first one in the i_completed_io list). So just move the structure to inode itself. This also allows for a small simplification in processing io_end structures. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Currently we sometimes used block_write_full_page() and sometimes ext4_bio_write_page() for writeback (depending on mount options and call path). Let's always use ext4_bio_write_page() to simplify things a bit. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Zheng Liu 提交于
This patch add supports for indirect file support punching hole. It is almost the same as ext4_ext_punch_hole. First, we invalidate all pages between this hole, and then we try to deallocate all blocks of this hole. A recursive function is used to handle deallocation of blocks. In this function, it iterates over the entries in inode's i_blocks or indirect blocks, and try to free the block for each one of them. After applying this patch, xfstest #255 will not pass w/o extent because indirect-based file doesn't support unwritten extents. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 13 1月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
After we have finished extending the file system, we need to trigger a the lazy inode table thread to zero out the inode tables. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 11 12月, 2012 13 次提交
-
-
由 Carlos Maiolino 提交于
Flags being used by atomic operations in inode flags (e.g. ext4_test_inode_flag(), should be consistent with that actually stored in inodes, i.e.: EXT4_XXX_FL. It ensures that this consistency is checked at build-time, not at run-time. Currently, the flags consistency are being checked at run-time, but, there is no real reason to not do a build-time check instead of a run-time check. The code is comparing macro defined values with enum type variables, where both are constants, so, there is no problem in comparing constants at build-time. enum variables are treated as constants by the C compiler, according to the C99 specs (see www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf sec. 6.2.5, item 16), so, there is no real problem in comparing an enumeration type at build time Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Tao Ma 提交于
Ted has sent out a RFC about removing this feature. Eric and Jan confirmed that both RedHat and SUSE enable this feature in all their product. David also said that "As far as I know, it's enabled in all Android kernels that use ext4." So it seems OK for us. And what's more, as inline data depends its implementation on xattr, and to be frank, I don't run any test again inline data enabled while xattr disabled. So I think we should add inline data and remove this config option in the same release. [ The savings if you disable CONFIG_EXT4_FS_XATTR is only 27k, which isn't much in the grand scheme of things. Since no one seems to be testing this configuration except for some automated compile farms, on balance we are better removing this config option, and so that it is effectively always enabled. -- tytso ] Cc: David Brown <davidb@codeaurora.org> Cc: Eric Sandeen <sandeen@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Tao Ma 提交于
Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Tao Ma 提交于
Currently ext4_delete_entry() is used only for dir entry removing from a dir block. So let us create a new function ext4_generic_delete_entry and this function takes a entry_buf and a buf_size so that it can be used for inline data. Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Tao Ma 提交于
search_dirblock is used to search a dir block, but the code is almost the same for searching an inline dir. So create a new fuction search_dir and let search_dirblock call it. Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Tao Ma 提交于
For "." and "..", we just call filldir by ourselves instead of iterating the real dir entry. Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Tao Ma 提交于
This patch let add_dir_entry handle the inline data case. So the dir is initialized as inline dir first and then we can try to add some files to it, when the inline space can't hold all the entries, a dir block will be created and the dir entry will be moved to it. Also for an inlined dir, "." and ".." are removed and we only use 4 bytes to store the parent inode number. These 2 entries will be added when we convert an inline dir to a block-based one. [ Folded in patch from Dan Carpenter to remove an unused variable. ] Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Tao Ma 提交于
The old add_dirent_to_buf handles all the work related to the work of adding dir entry to a dir block. Now we have inline data, so create 2 new function __ext4_find_dest_de and __ext4_insert_dentry that do the real work and let add_dirent_to_buf call them. Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Tao Ma 提交于
The __ext4_check_dir_entry() function() is used to check whether the de is over the block boundary. Now with inline data, it could be within the block boundary while exceeds the inode size. So check this function to check the overflow more precisely. Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Tao Ma 提交于
Currently, the initialization of dot and dotdot are encapsulated in ext4_mkdir and also bond with dir_block. So create a new function named ext4_init_new_dir and the initialization is moved to ext4_init_dot_dotdot. Now it will called either in the normal non-inline case(rec_len of ".." will cover the whole block) or when we converting an inline dir to a block(rec len of ".." will be the real length). The start of the next entry is also returned for inline dir usage. Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Tao Ma 提交于
For delayed allocation mode, we write to inline data if the file is small enough. And in case of we write to some offset larger than the inline size, the 1st page is dirtied, so that ext4_da_writepages can handle the conversion. When the 1st page is initialized with blocks, the inline part is removed. Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Tao Ma 提交于
For a normal write case (not journalled write, not delayed allocation), we write to the inline if the file is small and convert it to an extent based file when the write is larger than the max inline size. Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Tao Ma 提交于
Implement inline data with xattr. Now we use "system.data" to store xattr, and the xattr will be extended if the i_size is increased while we don't release the space during truncate. Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 29 11月, 2012 1 次提交
-
-
由 Theodore Ts'o 提交于
Previously, ext4_extents.h was being included at the end of ext4.h, which was bad for a number of reasons: (a) it was not being included in the expected place, and (b) it caused the header to be included multiple times. There were #ifdef's to prevent this from causing any problems, but it still was unnecessary. By moving the function declarations that were in ext4_extents.h to ext4.h, which is standard practice for where the function declarations for the rest of ext4.h can be found, we can remove ext4_extents.h from being included in ext4.h at all, and then we can only include ext4_extents.h where it is needed in ext4's source files. It should be possible to move a few more things into ext4.h, and further reduce the number of source files that need to #include ext4_extents.h, but that's a cleanup for another day. Reported-by: NSachin Kamat <sachin.kamat@linaro.org> Reported-by: NWei Yongjun <weiyj.lk@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 19 11月, 2012 1 次提交
-
-
由 Adam Buchbinder 提交于
"Whether" is misspelled in various comments across the tree; this fixes them. No code changes. Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 09 11月, 2012 2 次提交
-
-
由 Zheng Liu 提交于
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Zheng Liu 提交于
This patch adds two structures that supports extent status tree, extent_status and ext4_es_tree. Currently extent_status is used to track a delay extent for an inode, which record the start block and the length of the delay extent. ext4_es_tree is used to store all extent_status for an inode in memory. Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 22 10月, 2012 1 次提交
-
-
由 Tao Ma 提交于
In mke2fs, we only checksum the whole bitmap block and it is right. While in the kernel, we use EXT4_BLOCKS_PER_GROUP to indicate the size of the checksumed bitmap which is wrong when we enable bigalloc. The right size should be EXT4_CLUSTERS_PER_GROUP and this patch fixes it. Also as every caller of ext4_block_bitmap_csum_set and ext4_block_bitmap_csum_verify pass in EXT4_BLOCKS_PER_GROUP(sb)/8, we'd better removes this parameter and sets it in the function itself. Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NLukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org
-
- 10 10月, 2012 1 次提交
-
-
由 Theodore Ts'o 提交于
The function ext4_handle_dirty_super() was calculating the superblock on the wrong block data. As a result, when the superblock is modified while it is mounted (most commonly, when inodes are added or removed from the orphan list), the superblock checksum would be wrong. We didn't notice because the superblock *was* being correctly calculated in ext4_commit_super(), and this would get called when the file system was unmounted. So the problem only became obvious if the system crashed while the file system was mounted. Fix this by removing the poorly designed function signature for ext4_superblock_csum_set(); if it only took a single argument, the pointer to a struct superblock, the ambiguity which caused this mistake would have been impossible. Reported-by: NGeorge Spelvin <linux@horizon.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 05 10月, 2012 1 次提交
-
-
由 Dmitry Monakhov 提交于
BUG #1) All places where we call ext4_flush_completed_IO are broken because buffered io and DIO/AIO goes through three stages 1) submitted io, 2) completed io (in i_completed_io_list) conversion pended 3) finished io (conversion done) And by calling ext4_flush_completed_IO we will flush only requests which were in (2) stage, which is wrong because: 1) punch_hole and truncate _must_ wait for all outstanding unwritten io regardless to it's state. 2) fsync and nolock_dio_read should also wait because there is a time window between end_page_writeback() and ext4_add_complete_io() As result integrity fsync is broken in case of buffered write to fallocated region: fsync blkdev_completion ->filemap_write_and_wait_range ->ext4_end_bio ->end_page_writeback <-- filemap_write_and_wait_range return ->ext4_flush_completed_IO sees empty i_completed_io_list but pended conversion still exist ->ext4_add_complete_io BUG #2) Race window becomes wider due to the 'ext4: completed_io locking cleanup V4' patch series This patch make following changes: 1) ext4_flush_completed_io() now first try to flush completed io and when wait for any outstanding unwritten io via ext4_unwritten_wait() 2) Rename function to more appropriate name. 3) Assert that all callers of ext4_flush_unwritten_io should hold i_mutex to prevent endless wait Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 29 9月, 2012 1 次提交
-
-
由 Dmitry Monakhov 提交于
Inode's block defrag and ext4_change_inode_journal_flag() may affect nonlocked DIO reads result, so proper synchronization required. - Add missed inode_dio_wait() calls where appropriate - Check inode state under extra i_dio_count reference. Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-