- 09 2月, 2013 3 次提交
-
-
由 Theodore Ts'o 提交于
So we can better understand what bits of ext4 are responsible for long-running jbd2 handles, use jbd2__journal_start() so we can pass context information for logging purposes. The recommended way for finding the longer-running handles is: T=/sys/kernel/debug/tracing EVENT=$T/events/jbd2/jbd2_handle_stats echo "interval > 5" > $EVENT/filter echo 1 > $EVENT/enable ./run-my-fs-benchmark cat $T/trace > /tmp/problem-handles This will list handles that were active for longer than 20ms. Having longer-running handles is bad, because a commit started at the wrong time could stall for those 20+ milliseconds, which could delay an fsync() or an O_SYNC operation. Here is an example line from the trace file describing a handle which lived on for 311 jiffies, or over 1.2 seconds: postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32 tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1 dirtied_blocks 0 Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Move the jbd2 wrapper functions which start and stop handles out of super.c, where they don't really logically belong, and into ext4_jbd2.c. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Handles which stay open a long time are problematic when it comes time to close down a transaction so it can be committed. These tracepoints will help us determine which ones are the problematic ones, and to validate whether changes makes things better or worse. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 07 2月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
Track the delay between when we first request that the commit begin and when it actually begins, so we can see how much of a gap exists. In theory, this should just be the remaining scheduling quantuum of the thread which requested the commit (assuming it was not a synchronous operation which triggered the commit request) plus scheduling overhead; however, it's possible that real time processes might get in the way of letting the kjournald thread from executing. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 05 2月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
The ext4 block allocator only maintains buddy bitmaps for chunks which are less than or equal to one quarter of a block group. That is, for a file aystem with a 1k blocksize, and where the number of blocks in a block group is 8192 blocks, the largest chunk size tracked by buddy bitmaps is 2048 blocks. For a file system with a 4k blocksize, and where the number of blocks in a block group is 32768 blocks, the largest chunk size tracked by buddy bitmaps is 8192 blocks. To work around this code, mballoc.c before this commit would truncate allocation requests to the number of blocks in a block group minus 10. Why 10? Aside from being a completely arbitrary number, it avoids block allocation to be a power of two larger than 25% of the block group. If you try to explicitly fallocate 50% of the block group size, this will demonstrate the problem; the block allocation code will scan the all of the blocks in the file system with cr==0 (since the request is for a natural power of two), but then completely fail for all blocks groups, since the buddy bitmaps don't track chunk sizes of 50% of the block group. To fix this, in these we use ext4_mb_complex_scan_group() instead of ext4_mb_simple_scan_group(). Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger@dilger.ca>
-
- 03 2月, 2013 4 次提交
-
-
由 Theodore Ts'o 提交于
Check for incompatible mount options when using the ext4 file system driver to mount ext2 or ext3 file systems. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
If argument of inode_readahead_blk is too big, we just bail out without printing any error. Fix this since it could confuse users. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
The loop looking for correct mount option entry is more logical if it is written rewritten as an empty loop looking for correct option entry and then code handling the option. It also saves one level of indentation for a lot of code so we can join a couple of split lines. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Several mount option (resuid, resgid, journal_dev, journal_ioprio) are currently handled before we enter standard option handling loop. I don't see a reason for this so move them to normal handling loop to make things more regular. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 02 2月, 2013 4 次提交
-
-
由 Cong Ding 提交于
It is unnecessary to check i<4 after the loop; just do it before the break. Signed-off-by: NCong Ding <dinggnu@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Niu Yawei 提交于
In ext4_mb_add_n_trim(), lg_prealloc_lock should be taken when changing the lg_prealloc_list. Signed-off-by: NNiu Yawei <yawei.niu@intel.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
由 Akria Fujita 提交于
Commit 2147b1a6 resulted in a new smatch warning: > fs/ext4/move_extent.c:693 mext_replace_branches() > warn: variable dereferenced before check 'dext' (see line 683) Fix this by adding a check to make sure dext is non-NULL before we derefrence it. Signed-off-by: NAkria Fujita <a-fujita@rs.jp.nec.com> [ modified by tytso to make sure an ext4_error is called ] Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Julia Lawall 提交于
Use WARN rather than printk followed by WARN_ON(1), for conciseness. A simplified version of the semantic patch that makes this transformation is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression list es; @@ -printk( +WARN(1, es); -WARN_ON(1); // </smpl> Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 30 1月, 2013 2 次提交
-
-
由 Eric Sandeen 提交于
Don't send an extra wakeup to kjournald in the case where we already have the proper target in j_commit_request, i.e. that transaction has already been requested for commit. commit deeeaf13 "jbd2: fix fsync() tid wraparound bug" changed the logic leading to a wakeup, but it caused some extra wakeups which were found to lead to a measurable performance regression. Signed-off-by: NEric Sandeen <sandeen@redhat.com> [tytso@mit.edu: reworked check to make it clearer] Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Running AIO is pinning inode in memory using file reference. Once AIO is completed using aio_complete(), file reference is put and inode can be freed from memory. So we have to be sure that calling aio_complete() is the last thing we do with the inode. CC: stable@vger.kernel.org Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Acked-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 29 1月, 2013 10 次提交
-
-
由 Guo Chao 提交于
brelse() and ext4_journal_force_commit() are both inlined and able to handle NULL. Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Guo Chao 提交于
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Guo Chao 提交于
After commit 978fef91 (create __ext4_insert_dentry for dir entry insertion), 'reclen' is not used anymore. Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Guo Chao 提交于
Commit b0336e8d (ext4: calculate and verify checksums of directory leaf blocks) and commit dbe89444 (ext4: Calculate and verify checksums for htree nodes) forget to release buffer when checksum failed, at some places. Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Lukas Czerner 提交于
In two places we call WARN_ON() before we print out the debug message, however we agreed that the WARN_ON() is unnecessary at those places so remove them. Also use ext4_warning() instead of ext4_msg() and printk(). Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
Remove unused variable flags from dump_completed_IO(). The code is only exercised when EXT4FS_DEBUG is defined. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
-
由 Jan Kara 提交于
So far ext4_writepage() skipped writing pages that had any delayed or unwritten buffers attached. When blocksize < pagesize this breaks data=ordered mode guarantees as we can have a page with one freshly allocated buffer whose allocation is part of the committing transaction and another buffer in the page which is delayed or unwritten. So fix this problem by calling ext4_bio_writepage() anyway. It will submit mapped buffers and leave others alone. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
So far ext4_bio_writepage() unconditionally cleared dirty bit on all buffers underlying the page. That implicitely assumes we can write all buffers. So far that is true because callers call into ext4_bio_writepage() make sure all buffers in the page are mapped but: a) it's a data corruption bug waiting to happen b) in data=ordered mode when blocksize < pagesize we do need to write pages that may have only some of dirty buffers mapped. So change ext4_bio_writepage() to skip buffers that cannot be written without clearing their dirty bit. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
The argument b_size of mpage_add_bh_to_extent() was bogus since it was always == blocksize (which we can easily derive from inode->i_blkbits). Also second branch of condition: if (nrblocks >= EXT4_MAX_TRANS_DATA) { } else if ((nrblocks + (b_size >> mpd->inode->i_blkbits)) > EXT4_MAX_TRANS_DATA) { } was never taken because (b_size >> mpd->inode->i_blkbits) == 1. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
ext4_writepage(), write_cache_pages_da(), and mpage_da_submit_io() doesn't have to deal with the case when page doesn't have buffers. We attach buffers to a page in ->write_begin() and ->page_mkwrite() which covers all places where a page can become dirty. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 28 1月, 2013 6 次提交
-
-
由 Jan Kara 提交于
The function splices i_completed_io_list to its private list first. From that moment on we don't need any lock for working with io_end structures because all io_end structure on the list are only our own. So we can remove the other two lists in the function and free io_end immediately after we are done with it. CC: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
It does not make much sense to have struct work in ext4_io_end_t because we always use it for only one ext4_io_end_t per inode (the first one in the i_completed_io list). So just move the structure to inode itself. This also allows for a small simplification in processing io_end structures. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
We don't support delayed allocation in data=journal mode. So checking for it in mpage_da_submit_io() doesn't make really sence. If we ever decide to extend delayed allocation support to data=journal mode, adding __ext4_journalled_writepage() call will be the least of problems we have to solve. Most likely we'd have to implement separate writepages call anyways because we don't have transaction credits for writing more than a single page so mapping of page buffers would have to be done differently. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
When we cannot write a page we should use redirty_page_for_writepage() instead of plain set_page_dirty(). That tells writeback code we have problems, redirties only the page (redirtying buffers is not needed), and updates mm accounting of failed page writes. Also move clearing of buffer dirty flag after io_submit_add_bh(). At that moment we are sure buffer will be going to disk. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Currently we sometimes used block_write_full_page() and sometimes ext4_bio_write_page() for writeback (depending on mount options and call path). Let's always use ext4_bio_write_page() to simplify things a bit. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Zheng Liu 提交于
This patch add supports for indirect file support punching hole. It is almost the same as ext4_ext_punch_hole. First, we invalidate all pages between this hole, and then we try to deallocate all blocks of this hole. A recursive function is used to handle deallocation of blocks. In this function, it iterates over the entries in inode's i_blocks or indirect blocks, and try to free the block for each one of them. After applying this patch, xfstest #255 will not pass w/o extent because indirect-based file doesn't support unwritten extents. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 25 1月, 2013 2 次提交
-
-
由 Chen Gang 提交于
When usrjquota or grpjquota mount options are specified several times, we leak memory storing the names. Free the memory correctly. Signed-off-by: NChen Gang <gang.chen@asianux.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
由 Theodore Ts'o 提交于
In addition, print the error returned from ext4_enable_quotas() Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Cc: stable@vger.kernel.org
-
- 17 1月, 2013 1 次提交
-
-
由 Zheng Liu 提交于
This patch adds a tracepoint in ext4_punch_hole. CC: Lukas Czerner <lczerner@redhat.com> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 13 1月, 2013 4 次提交
-
-
由 Theodore Ts'o 提交于
After we have finished extending the file system, we need to trigger a the lazy inode table thread to zero out the inode tables. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eryu Guan 提交于
Validate the bh pointer before using it, since ext4_read_block_bitmap_nowait() might return NULL. I've seen this in fsfuzz testing. EXT4-fs error (device loop0): ext4_read_block_bitmap_nowait:385: comm touch: Cannot get buffer for block bitmap - block_group = 0, block_bitmap = 3925999616 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8121de25>] ext4_wait_block_bitmap+0x25/0xe0 ... Call Trace: [<ffffffff8121e1e5>] ext4_read_block_bitmap+0x35/0x60 [<ffffffff8125e9c6>] ext4_free_blocks+0x236/0xb80 [<ffffffff811d0d36>] ? __getblk+0x36/0x70 [<ffffffff811d0a5f>] ? __find_get_block+0x8f/0x210 [<ffffffff81191ef3>] ? kmem_cache_free+0x33/0x140 [<ffffffff812678e5>] ext4_xattr_release_block+0x1b5/0x1d0 [<ffffffff812679be>] ext4_xattr_delete_inode+0xbe/0x100 [<ffffffff81222a7c>] ext4_free_inode+0x7c/0x4d0 [<ffffffff812277b8>] ? ext4_mark_inode_dirty+0x88/0x230 [<ffffffff8122993c>] ext4_evict_inode+0x32c/0x490 [<ffffffff811b8cd7>] evict+0xa7/0x1c0 [<ffffffff811b8ed3>] iput_final+0xe3/0x170 [<ffffffff811b8f9e>] iput+0x3e/0x50 [<ffffffff812316fd>] ext4_add_nondir+0x4d/0x90 [<ffffffff81231d0b>] ext4_create+0xeb/0x170 [<ffffffff811aae9c>] vfs_create+0xac/0xd0 [<ffffffff811ac845>] lookup_open+0x185/0x1c0 [<ffffffff8129e3b9>] ? selinux_inode_permission+0xa9/0x170 [<ffffffff811acb54>] do_last+0x2d4/0x7a0 [<ffffffff811af743>] path_openat+0xb3/0x480 [<ffffffff8116a8a1>] ? handle_mm_fault+0x251/0x3b0 [<ffffffff811afc49>] do_filp_open+0x49/0xa0 [<ffffffff811bbaad>] ? __alloc_fd+0xdd/0x150 [<ffffffff8119da28>] do_sys_open+0x108/0x1f0 [<ffffffff8119db51>] sys_open+0x21/0x30 [<ffffffff81618959>] system_call_fastpath+0x16/0x1b Also fix comment for ext4_read_block_bitmap_nowait() Signed-off-by: NEryu Guan <guaneryu@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
由 Wang Shilong 提交于
Because the function 'sb_getblk' seldomly fails to return NULL value,it will be better to use 'unlikely' to optimize it. Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The only reason for sb_getblk() failing is if it can't allocate the buffer_head. So ENOMEM is more appropriate than EIO. In addition, make sure that the file system is marked as being inconsistent if sb_getblk() fails. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 07 1月, 2013 2 次提交
-
-
由 Eric Dumazet 提交于
commit 35f9c09f (tcp: tcp_sendpages() should call tcp_push() once) added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all frags but the last one for a splice() call. The condition used to set the flag in pipe_to_sendpage() relied on splice() user passing the exact number of bytes present in the pipe, or a smaller one. But some programs pass an arbitrary high value, and the test fails. The effect of this bug is a lack of tcp_push() at the end of a splice(pipe -> socket) call, and possibly very slow or erratic TCP sessions. We should both test sd->total_len and fact that another fragment is in the pipe (pipe->nrbufs > 1) Many thanks to Willy for providing very clear bug report, bisection and test programs. Reported-by: NWilly Tarreau <w@1wt.eu> Bisected-by: NWilly Tarreau <w@1wt.eu> Tested-by: NWilly Tarreau <w@1wt.eu> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Guo Chao 提交于
This fixes a buffer cache leak when creating a directory, introduced in commit a774f9c2. Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NTao Ma <boyu.mt@taobao.com>
-