- 10 4月, 2013 1 次提交
-
-
由 Lukas Czerner 提交于
Currently in ENOSPC condition when writing into unwritten space, or punching a hole, we might need to split the extent and grow extent tree. However since we can not allocate any new metadata blocks we'll have to zero out unwritten part of extent or punched out part of extent, or in the worst case return ENOSPC even though use actually does not allocate any space. Also in delalloc path we do reserve metadata and data blocks for the time we're going to write out, however metadata block reservation is very tricky especially since we expect that logical connectivity implies physical connectivity, however that might not be the case and hence we might end up allocating more metadata blocks than previously reserved. So in future, metadata reservation checks should be removed since we can not assure that we do not under reserve. And this is where reserved space comes into the picture. When mounting the file system we slice off a little bit of the file system space (2% or 4096 clusters, whichever is smaller) which can be then used for the cases mentioned above to prevent costly zeroout, or unexpected ENOSPC. The number of reserved clusters can be set via sysfs, however it can never be bigger than number of free clusters in the file system. Note that this patch fixes the failure of xfstest 274 as expected. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
-
- 09 4月, 2013 2 次提交
-
-
由 Eric Whitney 提交于
Values stored in s_freeclusters_counter and s_dirtyclusters_counter are both in cluster units. Remove the cluster to block conversion applied to s_freeclusters_counter causing an inflated estimate of free space because s_dirtyclusters_counter is not similarly converted. Rename free_blocks and dirty_blocks to better reflect the units these variables contain to avoid future confusion. This fix corrects ENOSPC failures for xfstests 127 and 231 on bigalloc file systems. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dr. Tilmann Bubeck 提交于
Add a new ioctl, EXT4_IOC_SWAP_BOOT which swaps i_blocks and associated attributes (like i_blocks, i_size, i_flags, ...) from the specified inode with inode EXT4_BOOT_LOADER_INO (#5). This is typically used to store a boot loader in a secure part of the filesystem, where it can't be changed by a normal user by accident. The data blocks of the previous boot loader will be associated with the given inode. This usercode program is a simple example of the usage: int main(int argc, char *argv[]) { int fd; int err; if ( argc != 2 ) { printf("usage: ext4-swap-boot-inode FILE-TO-SWAP\n"); exit(1); } fd = open(argv[1], O_WRONLY); if ( fd < 0 ) { perror("open"); exit(1); } err = ioctl(fd, EXT4_IOC_SWAP_BOOT); if ( err < 0 ) { perror("ioctl"); exit(1); } close(fd); exit(0); } [ Modified by Theodore Ts'o to fix a number of bugs in the original code.] Signed-off-by: NDr. Tilmann Bubeck <t.bubeck@reinform.de> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 04 4月, 2013 7 次提交
-
-
由 Lukas Czerner 提交于
Additionally print i_allocated_meta_blocks information as well. Signed-off-by: NLukas Czerner <lczerner@redhat.com>
-
由 Theodore Ts'o 提交于
In the case where an inode has a very stale transaction id (tid) in i_datasync_tid or i_sync_tid, it's possible that after a very large (2**31) number of transactions, that the tid number space might wrap, causing tid_geq()'s calculations to fail. Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily", attempted to fix this problem, but it only avoided kjournald spinning forever by fixing the logic in jbd2_log_start_commit(). Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c that might call jbd2_log_start_commit() with a stale tid, those functions will subsequently call jbd2_log_wait_commit() with the same stale tid, and then wait for a very long time. To fix this, we replace the calls to jbd2_log_start_commit() and jbd2_log_wait_commit() with a call to a new function, jbd2_complete_transaction(), which will correctly handle stale tid's. As a bonus, jbd2_complete_transaction() will avoid locking j_state_lock for writing unless a commit needs to be started. This should have a small (but probably not measurable) improvement for ext4's scalability. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reported-by: NBen Hutchings <ben@decadent.org.uk> Reported-by: NGeorge Barnett <gbarnett@atlassian.com> Cc: stable@vger.kernel.org
-
由 Theodore Ts'o 提交于
[ Added fixup from Lukáš Czerner which only checks the assertion when the inode is not new and is not being freed. ] Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Move common code in ext4_ind_truncate() and ext4_ext_truncate() into ext4_truncate(). This saves over 60 lines of code. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Move common code in ext4_ind_punch_hole() and ext4_ext_punch_hole() into ext4_punch_hole(). This saves over 150 lines of code. This also fixes a potential bug when the punch_hole() code is racing against indirect-to-extents or extents-to-indirect migation. We are currently using i_mutex to protect against changes to the inode flag; specifically, the append-only, immutable, and extents inode flags. So we need to take i_mutex before deciding whether to use the extents-specific or indirect-specific punch_hole code. Also, there was a missing call to ext4_inode_block_unlocked_dio() in the indirect punch codepath. This was added in commit 02d262df to block DIO readers racing against the punch operation in the codepath for extent-mapped inodes, but it was missing for indirect-block mapped inodes. One of the advantages of refactoring the code is that it makes such oversights much less likely. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Zheng Liu 提交于
After collapsing the handling of data ordered and data writeback codepath, ext4_generic_write_end() has only one caller, ext4_write_end(). So we fold it into ext4_write_end(). Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NLukas Czerner <lczerner@redhat.com>
-
由 Theodore Ts'o 提交于
The only difference between how we handle data=ordered and data=writeback is a single call to ext4_jbd2_file_inode(). Eliminate code duplication by factoring out redundant the code paths. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NLukas Czerner <lczerner@redhat.com>
-
- 20 3月, 2013 2 次提交
-
-
由 Theodore Ts'o 提交于
In data=journal mode, if we unmount the file system before a transaction has a chance to complete, when the journal inode is being evicted, we can end up calling into jbd2_log_wait_commit() for the last transaction, after the journalling machinery has been shut down. Arguably we should adjust ext4_should_journal_data() to return FALSE for the journal inode, but the only place it matters is ext4_evict_inode(), and so to save a bit of CPU time, and to make the patch much more obviously correct by inspection(tm), we'll fix it by explicitly not trying to waiting for a journal commit when we are evicting the journal inode, since it's guaranteed to never succeed in this case. This can be easily replicated via: mount -t ext4 -o data=journal /dev/vdb /vdb ; umount /vdb ------------[ cut here ]------------ WARNING: at /usr/projects/linux/ext4/fs/jbd2/journal.c:542 __jbd2_log_start_commit+0xba/0xcd() Hardware name: Bochs JBD2: bad log_start_commit: 3005630206 3005630206 0 0 Modules linked in: Pid: 2909, comm: umount Not tainted 3.8.0-rc3 #1020 Call Trace: [<c015c0ef>] warn_slowpath_common+0x68/0x7d [<c02b7e7d>] ? __jbd2_log_start_commit+0xba/0xcd [<c015c177>] warn_slowpath_fmt+0x2b/0x2f [<c02b7e7d>] __jbd2_log_start_commit+0xba/0xcd [<c02b8075>] jbd2_log_start_commit+0x24/0x34 [<c0279ed5>] ext4_evict_inode+0x71/0x2e3 [<c021f0ec>] evict+0x94/0x135 [<c021f9aa>] iput+0x10a/0x110 [<c02b7836>] jbd2_journal_destroy+0x190/0x1ce [<c0175284>] ? bit_waitqueue+0x50/0x50 [<c028d23f>] ext4_put_super+0x52/0x294 [<c020efe3>] generic_shutdown_super+0x48/0xb4 [<c020f071>] kill_block_super+0x22/0x60 [<c020f3e0>] deactivate_locked_super+0x22/0x49 [<c020f5d6>] deactivate_super+0x30/0x33 [<c0222795>] mntput_no_expire+0x107/0x10c [<c02233a7>] sys_umount+0x2cf/0x2e0 [<c02233ca>] sys_oldumount+0x12/0x14 [<c08096b8>] syscall_call+0x7/0xb ---[ end trace 6a954cc790501c1f ]--- jbd2_log_wait_commit: error: j_commit_request=-1289337090, tid=0 Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
-
由 Theodore Ts'o 提交于
Commit 84c17543 (ext4: move work from io_end to inode) triggered a regression when running xfstest #270 when the file system is mounted with dioread_nolock. The problem is that after ext4_evict_inode() calls ext4_ioend_wait(), this guarantees that last io_end structure has been freed, but it does not guarantee that the workqueue structure, which was moved into the inode by commit 84c17543, is actually finished. Once ext4_flush_completed_IO() calls ext4_free_io_end() on CPU #1, this will allow ext4_ioend_wait() to return on CPU #2, at which point the evict_inode() codepath can race against the workqueue code on CPU #1 accessing EXT4_I(inode)->i_unwritten_work to find the next item of work to do. Fix this by calling cancel_work_sync() in ext4_ioend_wait(), which will be renamed ext4_ioend_shutdown(), since it is only used by ext4_evict_inode(). Also, move the call to ext4_ioend_shutdown() until after truncate_inode_pages() and filemap_write_and_wait() are called, to make sure all dirty pages have been written back and flushed from the page cache first. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e *pdpt = 0000000030bc3001 *pde = 0000000000000000 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: Pid: 6, comm: kworker/u:0 Not tainted 3.8.0-rc3-00013-g84c17543-dirty #91 Bochs Bochs EIP: 0060:[<c01dda6a>] EFLAGS: 00010046 CPU: 0 EIP is at cwq_activate_delayed_work+0x3b/0x7e EAX: 00000000 EBX: 00000000 ECX: f505fe54 EDX: 00000000 ESI: ed5b697c EDI: 00000006 EBP: f64b7e8c ESP: f64b7e84 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: 00000000 CR3: 30bc2000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process kworker/u:0 (pid: 6, ti=f64b6000 task=f64b4160 task.ti=f64b6000) Stack: f505fe00 00000006 f64b7e9c c01de3d7 f6435540 00000003 f64b7efc c01def1d f6435540 00000002 00000000 0000008a c16d0808 c040a10b c16d07d8 c16d08b0 f505fe00 c16d0780 00000000 00000000 ee153df4 c1ce4a30 c17d0e30 00000000 Call Trace: [<c01de3d7>] cwq_dec_nr_in_flight+0x71/0xfb [<c01def1d>] process_one_work+0x5d8/0x637 [<c040a10b>] ? ext4_end_bio+0x300/0x300 [<c01e3105>] worker_thread+0x249/0x3ef [<c01ea317>] kthread+0xd8/0xeb [<c01e2ebc>] ? manage_workers+0x4bb/0x4bb [<c023a370>] ? trace_hardirqs_on+0x27/0x37 [<c0f1b4b7>] ret_from_kernel_thread+0x1b/0x28 [<c01ea23f>] ? __init_kthread_worker+0x71/0x71 Code: 01 83 15 ac ff 6c c1 00 31 db 89 c6 8b 00 a8 04 74 12 89 c3 30 db 83 05 b0 ff 6c c1 01 83 15 b4 ff 6c c1 00 89 f0 e8 42 ff ff ff <8b> 13 89 f0 83 05 b8 ff 6c c1 6c c1 00 31 c9 83 EIP: [<c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e SS:ESP 0068:f64b7e84 CR2: 0000000000000000 ---[ end trace a1923229da53d8a4 ]--- Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz>
-
- 11 3月, 2013 5 次提交
-
-
由 Lukas Czerner 提交于
Currently we only reserve space (data+metadata) in delayed allocation if we're allocating from new cluster (which is always in non-bigalloc file system) which is ok for data blocks, because we reserve the whole cluster. However we have to reserve metadata for every delayed block we're going to write because every block could potentially require metedata block when we need to grow the extent tree. Signed-off-by: NLukas Czerner <lczerner@redhat.com>
-
由 Lukas Czerner 提交于
Using yield() is strongly discouraged (see sched/core.c) especially since we can just use cond_resched(). Replace all use of yield() with cond_resched(). Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
ext4_releasepage() warns when it is passed a page with PageChecked set. However this can correctly happen when invalidate_inode_pages2_range() invalidates pages - and we should fail the release in that case. Since the page was dirty anyway, it won't be discarded and no harm has happened but it's good to be safe. Also remove bogus page_has_buffers() check - we are guaranteed page has buffers in this function. Reported-by: NZheng Liu <gnehzuil.liu@gmail.com> Tested-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Zheng Liu 提交于
When we try to split an extent, this extent could be zeroed out and mark as initialized. But we don't know this in ext4_map_blocks because it only returns a length of allocated extent. Meanwhile we will mark this extent as uninitialized because we only check m_flags. This commit update extent status tree when we try to split an unwritten extent. We don't need to worry about the status of this extent because we always mark it as initialized. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
-
由 Dmitry Monakhov 提交于
This commit adds a self-testing infrastructure like extent tree does to do a sanity check for extent status tree. After status tree is as a extent cache, we'd better to make sure that it caches right result. After applied this commit, we will get a lot of messages when we run xfstests as below. ... kernel: ES len assertation failed for inode: 230 retval 1 != map->m_len 3 in ext4_map_blocks (allocation) ... kernel: ES cache assertation failed for inode: 230 es_cached ex [974/2/4781/20] != found ex [974/1/4781/1000] ... kernel: ES insert assertation failed for inode: 635 ex_status [0/45/21388/w] != es_status [44/1/21432/u] ... Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 23 2月, 2013 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 22 2月, 2013 1 次提交
-
-
由 Darrick J. Wong 提交于
Create a helper function to check if a backing device requires stable page writes and, if so, performs the necessary wait. Then, make it so that all points in the memory manager that handle making pages writable use the helper function. This should provide stable page write support to most filesystems, while eliminating unnecessary waiting for devices that don't require the feature. Before this patchset, all filesystems would block, regardless of whether or not it was necessary. ext3 would wait, but still generate occasional checksum errors. The network filesystems were left to do their own thing, so they'd wait too. After this patchset, all the disk filesystems except ext3 and btrfs will wait only if the hardware requires it. ext3 (if necessary) snapshots pages instead of blocking, and btrfs provides its own bdi so the mm will never wait. Network filesystems haven't been touched, so either they provide their own stable page guarantees or they don't block at all. The blocking behavior is back to what it was before 3.0 if you don't have a disk requiring stable page writes. Here's the result of using dbench to test latency on ext2: 3.8.0-rc3: Operation Count AvgLat MaxLat ---------------------------------------- WriteX 109347 0.028 59.817 ReadX 347180 0.004 3.391 Flush 15514 29.828 287.283 Throughput 57.429 MB/sec 4 clients 4 procs max_latency=287.290 ms 3.8.0-rc3 + patches: WriteX 105556 0.029 4.273 ReadX 335004 0.005 4.112 Flush 14982 30.540 298.634 Throughput 55.4496 MB/sec 4 clients 4 procs max_latency=298.650 ms As you can see, the maximum write latency drops considerably with this patch enabled. The other filesystems (ext3/ext4/xfs/btrfs) behave similarly, but see the cover letter for those results. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Acked-by: NSteven Whitehouse <swhiteho@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 2月, 2013 4 次提交
-
-
由 Zheng Liu 提交于
After tracking all extent status, we already have a extent cache in memory. Every time we want to lookup a block mapping, we can first try to lookup it in extent status tree to avoid a potential disk I/O. A new function called ext4_es_lookup_extent is defined to finish this work. When we try to lookup a block mapping, we always call ext4_map_blocks and/or ext4_da_map_blocks. So in these functions we first try to lookup a block mapping in extent status tree. A new flag EXT4_GET_BLOCKS_NO_PUT_HOLE is used in ext4_da_map_blocks in order not to put a hole into extent status tree because this hole will be converted to delayed extent in the tree immediately. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
-
由 Zheng Liu 提交于
By recording the phycisal block and status, extent status tree is able to track the status of every extents. When we call _map_blocks functions to lookup an extent or create a new written/unwritten/delayed extent, this extent will be inserted into extent status tree. We don't load all extents from disk in alloc_inode() because it costs too much memory, and if a file is opened and closed frequently it will takes too much time to load all extent information. So currently when we create/lookup an extent, this extent will be inserted into extent status tree. Hence, the extent status tree may not comprehensively contain all of the extents found in the file. Here a condition we need to take care is that an extent might contains unwritten and delayed status simultaneously because an extent is delayed allocated and could be allocated by fallocate. At this time we need to keep delayed status because later we need to update delayed reservation space using it. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
-
由 Zheng Liu 提交于
This commit lets ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag because in later commit ext4_map_blocks needs to use this flag to determine the extent status. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
由 Zheng Liu 提交于
This commit adds two members in extent_status structure to let it record physical block and extent status. Here es_pblk is used to record both of them because physical block only has 48 bits. So extent status could be stashed into it so that we can save some memory. Now written, unwritten, delayed and hole are defined as status. Due to new member is added into extent status tree, all interfaces need to be adjusted. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 15 2月, 2013 3 次提交
-
-
由 Theodore Ts'o 提交于
Use ERR_PTR()/IS_ERR() abstraction instead of passing in a separate pointer to an integer for the error code, as a code cleanup. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Print some additional debugging context to hopefully help to debug a warning which is getting triggered by xfstests #74. Also remove extraneous newlines from when printk's were converted to ext4_warning() and ext4_msg(). Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Some messages printed related to a WARN_ON(1) were printed using KERN_NOTICE. Use KERN_WARNING or ext4_warning() instead so that context related to the WARN_ON() is printed at the same printk warning level (and log files, etc.) Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 09 2月, 2013 2 次提交
-
-
由 Theodore Ts'o 提交于
The grab_cache_page_write_begin() function can potentially sleep for a long time, since it may need to do memory allocation which can block if the system is under significant memory pressure, and because it may be blocked on page writeback. If it does take a long time to grab the page, it's better that we not hold an active jbd2 handle. So grab a handle on the page first, and _then_ start the transaction handle. This commit fixes the following long transaction handle hold time: postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32 tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1 dirtied_blocks 0 Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
由 Theodore Ts'o 提交于
So we can better understand what bits of ext4 are responsible for long-running jbd2 handles, use jbd2__journal_start() so we can pass context information for logging purposes. The recommended way for finding the longer-running handles is: T=/sys/kernel/debug/tracing EVENT=$T/events/jbd2/jbd2_handle_stats echo "interval > 5" > $EVENT/filter echo 1 > $EVENT/enable ./run-my-fs-benchmark cat $T/trace > /tmp/problem-handles This will list handles that were active for longer than 20ms. Having longer-running handles is bad, because a commit started at the wrong time could stall for those 20+ milliseconds, which could delay an fsync() or an O_SYNC operation. Here is an example line from the trace file describing a handle which lived on for 311 jiffies, or over 1.2 seconds: postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32 tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1 dirtied_blocks 0 Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 30 1月, 2013 1 次提交
-
-
由 Jan Kara 提交于
Running AIO is pinning inode in memory using file reference. Once AIO is completed using aio_complete(), file reference is put and inode can be freed from memory. So we have to be sure that calling aio_complete() is the last thing we do with the inode. CC: stable@vger.kernel.org Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Acked-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 29 1月, 2013 3 次提交
-
-
由 Jan Kara 提交于
So far ext4_writepage() skipped writing pages that had any delayed or unwritten buffers attached. When blocksize < pagesize this breaks data=ordered mode guarantees as we can have a page with one freshly allocated buffer whose allocation is part of the committing transaction and another buffer in the page which is delayed or unwritten. So fix this problem by calling ext4_bio_writepage() anyway. It will submit mapped buffers and leave others alone. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
The argument b_size of mpage_add_bh_to_extent() was bogus since it was always == blocksize (which we can easily derive from inode->i_blkbits). Also second branch of condition: if (nrblocks >= EXT4_MAX_TRANS_DATA) { } else if ((nrblocks + (b_size >> mpd->inode->i_blkbits)) > EXT4_MAX_TRANS_DATA) { } was never taken because (b_size >> mpd->inode->i_blkbits) == 1. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
ext4_writepage(), write_cache_pages_da(), and mpage_da_submit_io() doesn't have to deal with the case when page doesn't have buffers. We attach buffers to a page in ->write_begin() and ->page_mkwrite() which covers all places where a page can become dirty. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 28 1月, 2013 3 次提交
-
-
由 Jan Kara 提交于
We don't support delayed allocation in data=journal mode. So checking for it in mpage_da_submit_io() doesn't make really sence. If we ever decide to extend delayed allocation support to data=journal mode, adding __ext4_journalled_writepage() call will be the least of problems we have to solve. Most likely we'd have to implement separate writepages call anyways because we don't have transaction credits for writing more than a single page so mapping of page buffers would have to be done differently. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Currently we sometimes used block_write_full_page() and sometimes ext4_bio_write_page() for writeback (depending on mount options and call path). Let's always use ext4_bio_write_page() to simplify things a bit. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Zheng Liu 提交于
This patch add supports for indirect file support punching hole. It is almost the same as ext4_ext_punch_hole. First, we invalidate all pages between this hole, and then we try to deallocate all blocks of this hole. A recursive function is used to handle deallocation of blocks. In this function, it iterates over the entries in inode's i_blocks or indirect blocks, and try to free the block for each one of them. After applying this patch, xfstest #255 will not pass w/o extent because indirect-based file doesn't support unwritten extents. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 17 1月, 2013 1 次提交
-
-
由 Zheng Liu 提交于
This patch adds a tracepoint in ext4_punch_hole. CC: Lukas Czerner <lczerner@redhat.com> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 13 1月, 2013 2 次提交
-
-
由 Wang Shilong 提交于
Because the function 'sb_getblk' seldomly fails to return NULL value,it will be better to use 'unlikely' to optimize it. Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The only reason for sb_getblk() failing is if it can't allocate the buffer_head. So ENOMEM is more appropriate than EIO. In addition, make sure that the file system is marked as being inconsistent if sb_getblk() fails. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 12 1月, 2013 1 次提交
-
-
由 Miao Xie 提交于
writeback_inodes_sb(_nr)_if_idle() is re-implemented by replacing down_read() with down_read_trylock() because - If ->s_umount is write locked, then the sb is not idle. That is writeback_inodes_sb(_nr)_if_idle() needn't wait for the lock. - writeback_inodes_sb(_nr)_if_idle() grabs s_umount lock when it want to start writeback, it may bring us deadlock problem when doing umount. In order to fix the problem, ext4 and btrfs implemented their own writeback functions instead of writeback_inodes_sb(_nr)_if_idle(), but it introduced the redundant code, it is better to implement a new writeback_inodes_sb(_nr)_if_idle(). The name of these two functions is cumbersome, so rename them to try_to_writeback_inodes_sb(_nr). This idea came from Christoph Hellwig. Some code is from the patch of Kamal Mostafa. Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
-
- 26 12月, 2012 1 次提交
-
-
由 Jan Kara 提交于
We cannot wait for transaction commit in journal_unmap_buffer() because we hold page lock which ranks below transaction start. We solve the issue by bailing out of journal_unmap_buffer() and jbd2_journal_invalidatepage() with -EBUSY. Caller is then responsible for waiting for transaction commit to finish and try invalidation again. Since the issue can happen only for page stradding i_size, it is simple enough to manually call jbd2_journal_invalidatepage() for such page from ext4_setattr(), check the return value and wait if necessary. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-