- 01 7月, 2013 4 次提交
-
-
由 Joe Perches 提交于
Reduce the object size ~10% could be useful for embedded systems. Add #ifdef CONFIG_PRINTK #else #endif blocks to hold formats and arguments, passing " " to functions when !CONFIG_PRINTK and still verifying format and arguments with no_printk. $ size fs/ext4/built-in.o* text data bss dec hex filename 239375 610 888 240873 3ace9 fs/ext4/built-in.o.new 264167 738 888 265793 40e41 fs/ext4/built-in.o.old $ grep -E "CONFIG_EXT4|CONFIG_PRINTK" .config # CONFIG_PRINTK is not set CONFIG_EXT4_FS=y CONFIG_EXT4_USE_FOR_EXT23=y CONFIG_EXT4_FS_POSIX_ACL=y # CONFIG_EXT4_FS_SECURITY is not set # CONFIG_EXT4_DEBUG is not set Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> -
由 Zheng Liu 提交于
Now we maintain an proper in-order LRU list in ext4 to reclaim entries from extent status tree when we are under heavy memory pressure. For keeping this order, a spin lock is used to protect this list. But this lock burns a lot of CPU time. We can use the following steps to trigger it. % cd /dev/shm % dd if=/dev/zero of=ext4-img bs=1M count=2k % mkfs.ext4 ext4-img % mount -t ext4 -o loop ext4-img /mnt % cd /mnt % for ((i=0;i<160;i++)); do truncate -s 64g $i; done % for ((i=0;i<160;i++)); do cp $i /dev/null &; done % perf record -a -g % perf report This commit tries to fix this problem. Now a new member called i_touch_when is added into ext4_inode_info to record the last access time for an inode. Meanwhile we never need to keep a proper in-order LRU list. So this can avoid to burns some CPU time. When we try to reclaim some entries from extent status tree, we use list_sort() to get a proper in-order list. Then we traverse this list to discard some entries. In ext4_sb_info, we use s_es_last_sorted to record the last time of sorting this list. When we traverse the list, we skip the inode that is newer than this time, and move this inode to the tail of LRU list. When the head of the list is newer than s_es_last_sorted, we will sort the LRU list again. In this commit, we break the loop if s_extent_cache_cnt == 0 because that means that all extents in extent status tree have been reclaimed. Meanwhile in this commit, ext4_es_{un}register_shrinker()'s prototype is changed to save a local variable in these functions. Reported-by: NDave Hansen <dave.hansen@intel.com> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> -
由 Alexey Khoroshilov 提交于
If memory allocation in ext4_mb_new_group_pa() is failed, it returns error code, ext4_mb_new_preallocation() propages it, but ext4_mb_new_blocks() ignores it. An observed result was: - allocation fail means ext4_mb_new_group_pa() does not update ext4_allocation_context; - ext4_mb_new_blocks() sets ext4_allocation_request->len (ar->len = ac->ac_b_ex.fe_len;) to number of blocks preallocated (512) instead of number of blocks requested (1); - that activates update cycle in ext4_splice_branch(): for (i = 1; i < blks; i++) <-- blks is 512 instead of 1 here *(where->p + i) = cpu_to_le32(current_block++); - it iterates 511 times and corrupts a chunk of memory including inode structure; - page fault happens at EXT4_SB(inode->i_sb) in ext4_mark_inode_dirty(); - system hangs with 'scheduling while atomic' BUG. The patch implements a check for ext4_mb_new_preallocation() error code and handles its failure as if ext4_mb_regular_allocator() fails. Found by Linux File System Verification project (linuxtesting.org). [ Patch restructed by tytso to make the flow of control easier to follow. ] Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> -
由 Maarten ter Huurne 提交于
Subtracting the number of the first data block places the superblock backups one block too early, corrupting the file system. When the block size is larger than 1K, the first data block is 0, so the subtraction has no effect and no corruption occurs. Signed-off-by: NMaarten ter Huurne <maarten@treewalker.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> CC: stable@vger.kernel.org
-
- 17 6月, 2013 1 次提交
-
-
由 Jon Ernst 提交于
This patch removed several unused variables. Signed-off-by: NJon Ernst <jonernst07@gmx.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 13 6月, 2013 3 次提交
-
-
由 Jie Liu 提交于
Return the FIEMAP_EXTENT_UNKNOWN flag as well except the FIEMAP_EXTENT_DELALLOC because the data location of an delayed allocation extent is unknown. Signed-off-by: NJie Liu <jeff.liu@oracle.com> -
由 Dmitry Monakhov 提交于
If filesystem was aborted after inode's write back is complete but before its metadata was updated we may return success results in data loss. In order to handle fs abort correctly we have to check fs state once we discover that it is in MS_RDONLY state Test case: http://patchwork.ozlabs.org/patch/244297Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
Inode's data or non journaled quota may be written w/o jounral so we _must_ send a barrier at the end of ext4_sync_fs. But it can be skipped if journal commit will do it for us. Also fix data integrity for nojournal mode. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 12 6月, 2013 2 次提交
-
-
由 Theodore Ts'o 提交于
Commit 18888cf0: "ext4: speed up truncate/unlink by not using bforget() unless needed" removed the use of EXT4_FREE_BLOCKS_FORGET in the most important codepath for file systems using extents, but a similar optimization also can be done for file systems using indirect blocks, and for the two special cases in the ext4 extents code. Cc: Andrey Sidorov <qrxd43@motorola.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
For a file systems with a very large number of block groups, if all of the block group bitmaps are in memory and the file system is relatively badly fragmented, it's possible ext4_mb_regular_allocator() to take a long time trying to find a good match. This is especially true if the tuning parameter mb_max_to_scan has been sent to a very large number. So add a cond_resched() to avoid soft lockup warnings and to provide better system responsiveness. For ext4_free_blocks(), if we are deleting a large range of blocks, and data=journal is enabled so that EXT4_FREE_BLOCKS_FORGET is passed, the loop to call sb_find_get_block() and to call ext4_forget() can take over 10-15 milliseocnds or more. So it's better to add a cond_resched() here a well. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 07 6月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
Rename ext4_da_writepages() to ext4_writepages() and use it for all modes. We still need to iterate over all the pages in the case of data=journalling, but in the case of nodelalloc/data=ordered (which is what file systems mounted using ext3 backwards compatibility will use) this will allow us to use a much more efficient I/O submission path. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 06 6月, 2013 4 次提交
-
-
由 Theodore Ts'o 提交于
The test_root() function could potentially loop forever due to overflow issues. So rewrite test_root() to avoid this issue; as a bonus, it is 38% faster when benchmarked via a test loop: int main(int argc, char **argv) { int i; for (i = 0; i < 1 << 24; i++) { if (test_root(i, 7)) printf("%d\n", i); } } Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> -
由 Theodore Ts'o 提交于
The group number passed to ext4_get_group_info() should be valid, but let's add an assert to check this before we start creating a pointer based on that group number and dereferencing it. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> -
由 Theodore Ts'o 提交于
Check the group number for sanity earilier, before calling routines such as ext4_bg_has_super() or ext4_group_overhead_blocks(). Reported-by: NJonathan Salwan <jonathan.salwan@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The bio_alloc() function can return NULL if the memory allocation fails. So we need to check for this. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 05 6月, 2013 18 次提交
-
-
由 Jan Kara 提交于
Now that we clear PageWriteback after extent conversion, there's no need to wait for io_end processing in ext4_evict_inode(). Running AIO/DIO keeps file reference until aio_complete() is called so ext4_evict_inode() cannot be called. For io_end structures resulting from buffered IO waiting is happening because we wait for PageWriteback in truncate_inode_pages(). Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
We don't have to wait for extent conversion in ext4_punch_hole() as buffered IO for the punched range has been flushed and waited upon (thus all extent conversions for that range have completed). Also we wait for all DIO to finish using inode_dio_wait() so there cannot be any extent conversions pending due to direct IO. Also remove ext4_flush_unwritten_io() since it's unused now. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
We don't have to wait for unwritten extent conversion in ext4_ind_direct_IO() as all writes that happened before DIO are flushed by the generic code and extent conversion has happened before we cleared PageWriteback bit. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
After removal of ext4_flush_unwritten_io() call, ext4_file_sync() doesn't need i_mutex anymore. Forcing of transaction commits doesn't need i_mutex as there's nothing inode specific in that code apart from grabbing transaction ids from the inode. So remove the lock. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Just use the generic function instead of duplicating it. We only need to reshuffle the read-only check a bit (which is there to prevent writing to a filesystem which has been remounted read-only after error I assume). Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Since PageWriteback bit is now cleared after extents are converted from unwritten to written ones, we have full exclusion of writeback path from truncate (truncate_inode_pages() waits for PageWriteback bits to get cleared on all invalidated pages). Exclusion from DIO path is achieved by inode_dio_wait() call in ext4_setattr(). So there's no need to wait for extent convertion in ext4_truncate() anymore. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Make sure extent conversion after DIO happens while i_dio_count is still elevated so that inode_dio_wait() waits until extent conversion is done. This removes the need for explicit waiting for extent conversion in some cases. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Currently PageWriteback bit gets cleared from put_io_page() called from ext4_end_bio(). This is somewhat inconvenient as extent tree is not fully updated at that time (unwritten extents are not marked as written) so we cannot read the data back yet. This design was dictated by lock ordering as we cannot start a transaction while PageWriteback bit is set (we could easily deadlock with ext4_da_writepages()). But now that we use transaction reservation for extent conversion, locking issues are solved and we can move PageWriteback bit clearing after extent conversion is done. As a result we can remove wait for unwritten extent conversion from ext4_sync_file() because it already implicitely happens through wait_on_page_writeback(). We implement deferring of PageWriteback clearing by queueing completed bios to appropriate io_end and processing all the pages when io_end is going to be freed instead of at the moment ext4_io_end() is called. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Now that we have extent conversions with reserved transaction, we have to prevent extent conversions without reserved transaction (from DIO code) to block these (as that would effectively void any transaction reservation we did). So split lists, work items, and work queues to reserved and unreserved parts. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Later we would like to clear PageWriteback bit only after extent conversion from unwritten to written extents is performed. However it is not possible to start a transaction after PageWriteback is set because that violates lock ordering (and is easy to deadlock). So we have to reserve a transaction before locking pages and sending them for IO and later we use the transaction for extent conversion from ext4_end_io(). Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
There isn't any need for setting BH_Uninit on buffers anymore. It was only used to signal we need to mark io_end as needing extent conversion in add_bh_to_extent() but now we can mark the io_end directly when mapping extent. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
There are two issues with current writeback path in ext4. For one we don't necessarily map complete pages when blocksize < pagesize and thus needn't do any writeback in one iteration. We always map some blocks though so we will eventually finish mapping the page. Just if writeback races with other operations on the file, forward progress is not really guaranteed. The second problem is that current code structure makes it hard to associate all the bios to some range of pages with one io_end structure so that unwritten extents can be converted after all the bios are finished. This will be especially difficult later when io_end will be associated with reserved transaction handle. We restructure the writeback path to a relatively simple loop which first prepares extent of pages, then maps one or more extents so that no page is partially mapped, and once page is fully mapped it is submitted for IO. We keep all the mapping and IO submission information in mpage_da_data structure to somewhat reduce stack usage. Resulting code is somewhat shorter than the old one and hopefully also easier to read. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
We limit the number of blocks written in a single loop of ext4_da_writepages() to 64 when inode uses indirect blocks. That is unnecessary as credit estimates for mapping logically continguous run of blocks is rather low even for inode with indirect blocks. So just lift this limitation and properly calculate the number of necessary credits. This better credit estimate will also later allow us to always write at least a single page in one iteration. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
ext4_ind_trans_blocks() wrongly used 'chunk' argument to decide whether blocks mapped are logically contiguous. That is wrong since the argument informs whether the blocks are physically contiguous. As the blocks mapped are always logically contiguous and that's all ext4_ind_trans_blocks() cares about, just remove the 'chunk' argument. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
This attribute is now unused so deprecate it. We still show the old default value to keep some compatibility but we don't allow writing to that attribute anymore. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Writeback code got better in how it submits IO and now the number of pages requested to be written is usually higher than original 1024. The number is now dynamically computed based on observed throughput and is set to be about 0.5 s worth of writeback. E.g. on ordinary SATA drive this ends up somewhere around 10000 as my testing shows. So remove the unnecessary smarts from ext4_da_writepages(). Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
In some cases we cannot start a transaction because of locking constraints and passing started transaction into those places is not handy either because we could block transaction commit for too long. Transaction reservation is designed to solve these issues. It reserves a handle with given number of credits in the journal and the handle can be later attached to the running transaction without blocking on commit or checkpointing. Reserved handles do not block transaction commit in any way, they only reduce maximum size of the running transaction (because we have to always be prepared to accomodate request for attaching reserved handle). Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 04 6月, 2013 1 次提交
-
-
由 Jan Kara 提交于
Change writeback path to create just one io_end structure for the extent to which we submit IO and share it among bios writing that extent. This prevents needless splitting and joining of unwritten extents when they cannot be submitted as a single bio. Bugs in ENOMEM handling found by Linux File System Verification project (linuxtesting.org) and fixed by Alexey Khoroshilov <khoroshilov@ispras.ru>. CC: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 01 6月, 2013 4 次提交
-
-
由 Jan Kara 提交于
The arithmetics adding delalloc blocks to the number of used blocks in ext4_getattr() can easily overflow on 32-bit archs as we first multiply number of blocks by blocksize and then divide back by 512. Make the arithmetics more clever and also use proper type (unsigned long long instead of unsigned long). Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
On 32-bit architectures with 32-bit sector_t computation of data offset in ext4_xattr_fiemap() can overflow resulting in reporting bogus data location. Fix the problem by typing block number to proper type before shifting. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
ext4_lblk_t is just u32 so multiplying it by blocksize can easily overflow for files larger than 4 GB. Fix that by properly typing the block offsets before shifting. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
-
由 Jan Kara 提交于
On 32-bit archs when sector_t is defined as 32-bit the logic computing data offset in ext4_inline_data_fiemap(). Fix that by properly typing the shifted value. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 28 5月, 2013 2 次提交
-
-
由 Paul Taysom 提交于
Suppress the messages releating to processing the ext4 orphan list ("truncating inode" and "deleting unreferenced inode") unless the debug option is on, since otherwise they end up taking up space in the log that could be used for more useful information. Tested by opening several files, unlinking them, then crashing the system, rebooting the system and examining /var/log/messages. Addresses the problem described in http://crbug.com/220976Signed-off-by: NPaul Taysom <taysom@chromium.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> -
由 Lukas Czerner 提交于
Currently punch hole is disabled in file systems with bigalloc feature enabled. However the recent changes in punch hole patch should make it easier to support punching holes on bigalloc enabled file systems. This commit changes partial_cluster handling in ext4_remove_blocks(), ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently partial_cluster is unsigned long long type and it makes sure that we will free the partial cluster if all extents has been released from that cluster. However it has been specifically designed only for truncate. With punch hole we can be freeing just some extents in the cluster leaving the rest untouched. So we have to make sure that we will notice cluster which still has some extents. To do this I've changed partial_cluster to be signed long long type. The only scenario where this could be a problem is when cluster_size == block size, however in that case there would not be any partial clusters so we're safe. For bigger clusters the signed type is enough. Now we use the negative value in partial_cluster to mark such cluster used, hence we know that we must not free it even if all other extents has been freed from such cluster. This scenario can be described in simple diagram: |FFF...FF..FF.UUU| ^----------^ punch hole . - free space | - cluster boundary F - freed extent U - used extent Also update respective tracepoints to use signed long long type for partial_cluster. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-