- 29 10月, 2011 1 次提交
-
-
由 Yongqiang Yang 提交于
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 27 10月, 2011 2 次提交
-
-
由 Eric Gouriou 提交于
ext4_ext_insert_extent() (respectively ext4_ext_insert_index()) was using EXT_MAX_EXTENT() (resp. EXT_MAX_INDEX()) to determine how many entries needed to be moved beyond the insertion point. In practice this means that (320 - I) * 24 bytes were memmove()'d when I is the insertion point, rather than (#entries - I) * 24 bytes. This patch uses EXT_LAST_EXTENT() (resp. EXT_LAST_INDEX()) instead to only move existing entries. The code flow is also simplified slightly to highlight similarities and reduce code duplication in the insertion logic. This patch reduces system CPU consumption by over 25% on a 4kB synchronous append DIO write workload when used with the pre-2.6.39 x86_64 memmove() implementation. With the much faster 2.6.39 memmove() implementation we still see a decrease in system CPU usage between 2% and 7%. Note that the ext_debug() output changes with this patch, splitting some log information between entries. Users of the ext_debug() output should note that the "move %d" units changed from reporting the number of bytes moved to reporting the number of entries moved. Signed-off-by: NEric Gouriou <egouriou@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Gouriou 提交于
This patch introduces a fast path in ext4_ext_convert_to_initialized() for the case when the conversion can be performed by transferring the newly initialized blocks from the uninitialized extent into an adjacent initialized extent. Doing so removes the expensive invocations of memmove() which occur during extent insertion and the subsequent merge. In practice this should be the common case for clients performing append writes into files pre-allocated via fallocate(FALLOC_FL_KEEP_SIZE). In such a workload performed via direct IO and when using a suboptimal implementation of memmove() (x86_64 prior to the 2.6.39 rewrite), this patch reduces kernel CPU consumption by 32%. Two new trace points are added to ext4_ext_convert_to_initialized() to offer visibility into its operations. No exit trace point has been added due to the multiplicity of return points. This can be revisited once the upstream cleanup is backported. Signed-off-by: NEric Gouriou <egouriou@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 26 10月, 2011 3 次提交
-
-
由 Tao Ma 提交于
When we want to convert the unitialized extent in direct write, we can either do it in ext4_end_io_nolock(AIO case) or in ext4_ext_direct_IO(non AIO case) and EXT4_I(inode)->cur_aio_dio is a guard for ext4_ext_map_blocks to find the right case. In e9e3bcec, we mistakenly change it by: - if (io) + if (io && !(io->flag & EXT4_IO_END_UNWRITTEN)) { io->flag = EXT4_IO_END_UNWRITTEN; - else + atomic_inc(&EXT4_I(inode)->i_aiodio_unwritten); + } else ext4_set_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN); So now if we map 2 blocks, and the first one set the EXT_IO_END_UNWRITTEN, the 2nd mapping will set inode state because of the check for the flag. This is wrong. Cc: Eric Sandeen <sandeen@redhat.com> Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Curt Wohlgemuth 提交于
In ext4_ext_next_allocated_block(), the path[depth] might have a p_ext that is NULL -- see ext4_ext_binsearch(). In such a case, dereferencing it will crash the machine. This patch checks for p_ext == NULL in ext4_ext_next_allocated_block() before dereferencinging it. Tested using a hand-crafted an inode with eh_entries == 0 in an extent block, verified that running FIEMAP on it crashes without this patch, works fine with it. Signed-off-by: NCurt Wohlgemuth <curtw@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dan Carpenter 提交于
When allocated is unsigned it breaks the error handling at the end of the function when we call: allocated = ext4_split_extent(...); if (allocated < 0) err = allocated; I've made it a signed int instead of unsigned. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 25 10月, 2011 2 次提交
-
-
由 Dmitry Monakhov 提交于
EOFBLOCK_FL should be updated if called w/o FALLOCATE_FL_KEEP_SIZE Currently it happens only if new extent was allocated. TESTCASE: fallocate test_file -n -l4096 fallocate test_file -l4096 Last fallocate cmd has updated size, but keept EOFBLOCK_FL set. And fsck will complain about that. Also remove ping pong in ext4_fallocate() in case of new extents, where ext4_ext_map_blocks() clear EOFBLOCKS bit, and later ext4_falloc_update_inode() restore it again. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
- Both callers(truncate and punch_hole) already aligned left end point so we no longer need split logic here. - Remove dead duplicated code. - Call ext4_ext_dirty only after we have updated eh_entries, otherwise we'll loose entries update. Regression caused by d583fb87 266'th testcase in xfstests (http://patchwork.ozlabs.org/patch/120872) Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 22 10月, 2011 1 次提交
-
-
由 Dmitry Monakhov 提交于
Currently code make an impression what grow procedure is very complicated and some mythical paths, blocks are involved. But in fact grow in depth it relatively simple procedure: 1) Just create new meta block and copy root data to that block. 2) Convert root from extent to index if old depth == 0 3) Update root block pointer This patch does: - Reorganize code to make it more self explanatory - Do not pass path parameter to new_meta_block() in order to provoke allocation from inode's group because top-level block should site closer to it's inode, but not to leaf data block. [ This happens anyway, due to logic in mballoc; we should drop the path parameter from new_meta_block() entirely. -- tytso ] Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 18 10月, 2011 1 次提交
-
-
由 H Hartley Sweeten 提交于
The third parameter to ext4_free_blocks is a struct buffer_head *. This parameter should be NULL not 0. This quiets the sparse noise: warning: Using plain integer as NULL pointer Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 17 10月, 2011 1 次提交
-
-
由 Tao Ma 提交于
Add a sanity check to make sure ix hasn't gone beyond the valid bounds of the extent block. Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 09 10月, 2011 2 次提交
-
-
由 Tao Ma 提交于
ext4_extent_idx.e_block is __le32, so use le32_to_cpu() in ext4_ext_search_left(). Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Tao Ma 提交于
The comment describing what ext4_ext_search_right() does is incorrect. We return 0 in *phys when *logical is the 'largest' allocated block, not smallest. Fix a few other typos while we're at it. Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: NTao Ma <boyu.mt@taobao.com>
-
- 10 9月, 2011 5 次提交
-
-
由 Aditya Kali 提交于
Currently, there exists a race between delayed allocated writes and the writeback when bigalloc feature is in use. The race was because we wanted to determine what blocks in a cluster are under delayed allocation and we were using buffer_delayed(bh) check for it. But, the writeback codepath clears this bit without any synchronization which resulted in a race and an ext4 warning similar to: EXT4-fs (ram1): ext4_da_update_reserve_space: ino 13, used 1 with only 0 reserved data blocks The race existed in two places. (1) between ext4_find_delalloc_range() and ext4_map_blocks() when called from writeback code path. (2) between ext4_find_delalloc_range() and ext4_da_get_block_prep() (where buffer_delayed(bh) is set. To fix (1), this patch introduces a new buffer_head state bit - BH_Da_Mapped. This bit is set under the protection of EXT4_I(inode)->i_data_sem when we have actually mapped the delayed allocated blocks during the writeout time. We can now reliably check for this bit inside ext4_find_delalloc_range() to determine whether the reservation for the blocks have already been claimed or not. To fix (2), it was necessary to set buffer_delay(bh) under the protection of i_data_sem. So, I extracted the very beginning of ext4_map_blocks into a new function - ext4_da_map_blocks() - and performed the required setting of bh_delay bit and the quota reservation under the protection of i_data_sem. These two fixes makes the checking of buffer_delay(bh) and buffer_da_mapped(bh) consistent, thus removing the race. Tested: I was able to reproduce the problem by running 'dd' and 'fsync' in parallel. Also, xfstests sometimes used to reproduce this race. After the fix both my test and xfstests were successful and no race (warning message) was observed. Google-Bug-Id: 4997027 Signed-off-by: NAditya Kali <adityakali@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Aditya Kali 提交于
This patch adds some tracepoints in ext4/extents.c and updates a tracepoint in ext4/inode.c. Tested: Built and ran the kernel and verified that these tracepoints work. Also ran xfstests. Signed-off-by: NAditya Kali <adityakali@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Aditya Kali 提交于
With bigalloc changes, the i_blocks value was not correctly set (it was still set to number of blocks being used, but in case of bigalloc, we want i_blocks to represent the number of clusters being used). Since the quota subsystem sets the i_blocks value, this patch fixes the quota accounting and makes sure that the i_blocks value is set correctly. Signed-off-by: NAditya Kali <adityakali@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
When we are truncating (as opposed unlinking) a file, we need to worry about partial truncates of a file, especially in the light of sparse files. The changes here make sure that arbitrary truncates of sparse files works correctly. Yeah, it's messy. Note that these functions will need to be revisted when the punch ioctl is integrated --- in fact this commit will probably have merge conflicts with the punch changes which Allison Henders and the IBM LTC have been working on. I will need to fix this up when either patch hits mainline. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
If we need to allocate a new block in ext4_ext_map_blocks(), the function needs to see if the cluster has already been allocated. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 07 9月, 2011 1 次提交
-
-
由 Allison Henderson 提交于
While running extended fsx tests to verify the first two patches, a similar bug was also found in the truncate operation. This bug happens because the truncate routine only zeros the unblock aligned portion of the last page. This means that the block aligned portions of the page appearing after i_size are left unzeroed, and the buffer heads still mapped. This bug is corrected by using ext4_discard_partial_page_buffers in the truncate routine to zero the partial page and unmap the buffer headers. Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 04 9月, 2011 1 次提交
-
-
由 Theodore Ts'o 提交于
Add debugging information in case jbd2_journal_dirty_metadata() is called with a buffer_head which didn't have jbd2_journal_get_write_access() called on it, or if the journal_head has the wrong transaction in it. In addition, return an error code. This won't change anything for ocfs2, which will BUG_ON() the non-zero exit code. For ext4, the caller of this function is ext4_handle_dirty_metadata(), and on seeing a non-zero return code, will call __ext4_journal_stop(), which will print the function and line number of the (buggy) calling function and abort the journal. This will allow us to recover instead of bug halting, which is better from a robustness and reliability point of view. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 03 9月, 2011 2 次提交
-
-
由 Allison Henderson 提交于
This patch fixes a second punch hole bug found by xfstests 127. This bug happens because punch hole needs to flush the pages of the hole to avoid race conditions. But if the end of the hole is in the same page as i_size, the buffer heads beyond i_size need to be unmapped and the page needs to be zeroed after it is flushed. To correct this, the new ext4_discard_partial_page_buffers routine is used to zero and unmap the partial page beyond i_size if the end of the hole appears in the same page as i_size. The code has also been optimized to set the end of the hole to the page after i_size if the specified hole exceeds i_size, and the code that flushes the pages has been simplified. Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
-
由 Allison Henderson 提交于
This patch addresses a bug found by xfstests 75, 112, 127 when blocksize = 1k This bug happens because the punch hole code only zeros out non block aligned regions of the page. This means that if the blocks are smaller than a page, then the block aligned regions of the page inside the hole are left un-zeroed, and their buffer heads are still mapped. This bug is corrected by using ext4_discard_partial_page_buffers to properly zero the partial page at the head and tail of the hole, and unmap the corresponding buffer heads This patch also addresses a bug reported by Lukas while working on a new patch to add discard support for loop devices using punch hole. The bug happened because of the first and last block number needed to be cast to a larger data type before calculating the byte offset, but since now we only need the byte offsets of the pages, we no longer even need to be calculating the byte offsets of the blocks. The code to do the block offset calculations is removed in this patch. Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
-
- 28 7月, 2011 2 次提交
-
-
由 Utako Kusaka 提交于
The logical block number in map.l_blk is a __u32, and so before we shift it left, by the block size, we neeed cast it to a 64-bit size. Otherwise i_size can be corrupted on an ENOSPC. # df -T /mnt/mp1 Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/sda6 ext4 9843276 153056 9190200 2% /mnt/mp1 # fallocate -o 0 -l 2199023251456 /mnt/mp1/testfile fallocate: /mnt/mp1/testfile: fallocate failed: No space left on device # stat /mnt/mp1/testfile File: `/mnt/mp1/testfile' Size: 4293656576 Blocks: 19380440 IO Block: 4096 regular file Device: 806h/2054d Inode: 12 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2011-07-25 13:01:31.414490496 +0900 Modify: 2011-07-25 13:01:31.414490496 +0900 Change: 2011-07-25 13:01:31.454490495 +0900 Signed-off-by: NUtako Kusaka <u-kusaka@wm.jp.nec.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> -- fs/ext4/extents.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
-
由 Robin Dong 提交于
The old function ext4_ext_rm_idx is used only for truncate case because it just remove last index in extent-index-block. When punching hole, it usually needed to remove "middle" index, therefore we must move indexes which after it forward. (I create a file with 1 depth extent tree and punch hole in the middle of it, the last index in index-block strangly gone, so I find out this bug) Signed-off-by: NRobin Dong <sanbai@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 24 7月, 2011 3 次提交
-
-
由 Robin Dong 提交于
The comment for ext4_ext_check_cache has a litte mistake. Signed-off-by: NRobin Dong <sanbai@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Robin Dong 提交于
The debug message in ext4_ext_insert_extent before moving extent is incorrect (the "from xx to xx"). Signed-off-by: NRobin Dong <sanbai@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Robin Dong 提交于
The argument "inode" in function ext4_ext_next_allocated_block looks useless, so clean it. Signed-off-by: NRobin Dong <sanbai@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 18 7月, 2011 4 次提交
-
-
由 Robin Dong 提交于
If eh_entries is equal to (or greater than) eh_max, the operation of inserting new extent_idx will make number of entries overflow. So check eh_entries before inserting the new extent_idx. Although there is no bug case according the code (function ext4_ext_insert_index is called by ext4_ext_split and ext4_ext_split is called only if the index block has free space), the right logic should be "lookup the capacity before insertion". Signed-off-by: NRobin Dong <sanbai@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Robin Dong 提交于
This patch avoids an extraneous lookup of the extent cache in ext4_ext_map_blocks() when the flag EXT4_GET_BLOCKS_PUNCH_OUT_EXT is absent. The existing logic was performing the lookup but not making use of the result. The patch simply reverses the order of evaluation in the condition. Since ext4_ext_in_cache() does not initialize newex on misses, bypassing its invocation does not introduce any new issue in this regard. Signed-off-by: NRobin Dong <sanbai@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NLukas Czerner <lczerner@redhat.com> Reviewed-by: NEric Gouriou <egouriou@google.com>
-
由 Allison Henderson 提交于
This patch removes the extra parameter in ext4_ext_remove_space() which is no longer needed. Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Allison Henderson 提交于
This patch optimizes the punch hole operation by skipping the tree walking code that is used by truncate. Since punch hole is done through map blocks, the path to the extent is already known in this function, so we do not need to look it up again. Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 12 7月, 2011 1 次提交
-
-
由 Robin Dong 提交于
Optimize ext4_ext_insert_extent() by avoiding ext4_ext_next_leaf_block() when the result is not used/needed. Signed-off-by: NRobin Dong <sanbai@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 11 7月, 2011 3 次提交
-
-
由 Robin Dong 提交于
If eh->eh_entries is smaller than eh->eh_max, the routine will go to the "repeat" and then go to "has_space" directlly , since argument "depth" and "eh" are not even changed. Therefore, goto "has_space" directly and remove redundant "repeat" tag. Signed-off-by: NRobin Dong <sanbai@taobao.com>
-
由 Jiaying Zhang 提交于
Upon corrupted inode or disk failures, we may fail after we already allocate some blocks from the inode or take some blocks from the inode's preallocation list, but before we successfully insert the corresponding extent to the extent tree. In this case, we should free any allocated blocks and discard the inode's preallocated blocks because the entries in the inode's preallocation list may be in an inconsistent state. Signed-off-by: NJiaying Zhang <jiayingz@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
-
由 Maxim Patlasov 提交于
The current implementation of ext4_free_blocks() always calls dquot_free_block This looks quite sensible in the most cases: blocks to be freed are associated with inode and were accounted in quota and i_blocks some time ago. However, there is a case when blocks to free were not accounted by the time calling ext4_free_blocks() yet: 1. delalloc is on, write_begin pre-allocated some space in quota 2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks() 3. then ext4_ext_map_blocks() gets an error (e.g. ENOSPC) from ext4_ext_insert_extent() and calls ext4_free_blocks(). In this scenario, ext4_free_blocks() calls dquot_free_block() who, in turn, decrements i_blocks for blocks which were not accounted yet (due to delalloc) After clean umount, e2fsck reports something like: > Inode 21, i_blocks is 5080, should be 5128. Fix<y>? because i_blocks was erroneously decremented as explained above. The patch fixes the problem by passing the new flag EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request that the dquot_free_block() call be skipped. Signed-off-by: NMaxim Patlasov <maxim.patlasov@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
-
- 28 6月, 2011 3 次提交
-
-
由 Yongqiang Yang 提交于
Unused variables was deleted. Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
I found that ext4_ext_find_goal() and ext4_find_near() share the same code for returning a coloured start block based on i_block_group. We can refactor this into a common function so that they don't diverge in the future. Thanks to adilger for suggesting the new function name. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Robin Dong 提交于
In function ext4_ext_insert_index when eh_entries of curp is bigger than eh_max, error messages will be printed out, but the content is about logical and ei_block, that's incorret. Signed-off-by: NRobin Dong <sanbai@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 06 6月, 2011 2 次提交
-
-
由 Lukas Czerner 提交于
Currently we are not marking the extent as the last one (FIEMAP_EXTENT_LAST) if there is a hole at the end of the file. This is because we just do not check for it right now and continue searching for next extent. But at the point we hit the hole at the end of the file, it is too late. This commit adds check for the allocated block in subsequent extent and if there is no more extents (block = EXT_MAX_BLOCKS) just flag the current one as the last one. This behaviour has been spotted unintentionally by 252 xfstest, when the test hangs out, because of wrong loop condition. However on other filesystems (like xfs) it will exit anyway, because we notice the last extent flag and exit. With this patch xfstest 252 does not hang anymore, ext4 fiemap implementation still reports bad extent type in some cases, however this seems to be different issue. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
Kazuya Mio reported that he was able to hit BUG_ON(next == lblock) in ext4_ext_put_gap_in_cache() while creating a sparse file in extent format and fill the tail of file up to its end. We will hit the BUG_ON when we write the last block (2^32-1) into the sparse file. The root cause of the problem lies in the fact that we specifically set s_maxbytes so that block at s_maxbytes fit into on-disk extent format, which is 32 bit long. However, we are not storing start and end block number, but rather start block number and length in blocks. It means that in order to cover extent from 0 to EXT_MAX_BLOCK we need EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) - and it does not. The only way to fix it without changing the meaning of the struct ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes by one fs block so we can cover the whole extent we can get by the on-disk extent format. Also in many places EXT_MAX_BLOCK is used as length instead of maximum logical block number as the name suggests, it is all a bit messy. So this commit renames it to EXT_MAX_BLOCKS and change its usage in some places to actually be maximum number of blocks in the extent. The bug which this commit fixes can be reproduced as follows: dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2)) sync dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1)) Reported-by: NKazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-