- 09 3月, 2014 5 次提交
-
-
由 Theodore Ts'o 提交于
jh->b_transaction is adequately protected for reading by the jbd_lock_bh_state(bh), so we don't need to take j_list_lock in __journal_try_to_free_buffer(). Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
We don't otherwise need j_list_lock during the rest of commit phase #7, so add the transaction to the checkpoint list at the very end of commit phase #6. This allows us to drop j_list_lock earlier, which is a good thing since it is a super hot lock. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The two hottest locks, and thus the biggest scalability bottlenecks, in the jbd2 layer, are the j_list_lock and j_state_lock. This has inspired some people to do some truly unnatural things[1]. [1] https://www.usenix.org/system/files/conference/fast14/fast14-paper_kang.pdf We don't need to be holding both j_state_lock and j_list_lock while calculating the journal statistics, so move those calculations to the very end of jbd2_journal_commit_transaction. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The j_state_lock is one of the hottest locks in the jbd2 layer and thus one of its scalability bottlenecks. We don't need to be holding the j_state_lock while we are calling wake_up(&journal->j_wait_commit), so release the lock a little bit earlier. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
During commit process, keep the block device plugged after we are done writing the revoke records, until we are finished writing the rest of the commit records in the journal. This will allow most of the journal blocks to be written in a single I/O operation, instead of separating the the revoke blocks from the rest of the journal blocks. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 18 2月, 2014 2 次提交
-
-
由 Rashika Kheria 提交于
Mark functions as static in jbd2/journal.c because they are not used outside this file. This eliminates the following warning in jbd2/journal.c: fs/jbd2/journal.c:125:5: warning: no previous prototype for ‘jbd2_verify_csum_type’ [-Wmissing-prototypes] fs/jbd2/journal.c:146:5: warning: no previous prototype for ‘jbd2_superblock_csum_verify’ [-Wmissing-prototypes] fs/jbd2/journal.c:154:6: warning: no previous prototype for ‘jbd2_superblock_csum_set’ [-Wmissing-prototypes] Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJosh Triplett <josh@joshtriplett.org> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dan Carpenter 提交于
If start_this_handle() fails then it leads to a use after free of "handle". Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 09 12月, 2013 3 次提交
-
-
由 Dmitry Monakhov 提交于
Rename performed via: perl -pi -e 's/JBD:/JBD2:/g' fs/jbd2/*.c Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
-
由 Jan Kara 提交于
Some of KERN_EMERG printk messages do not really deserve this log level and the one in log_wait_commit() is even rather useless (the journal has been previously aborted and *that* is where we should have been complaining). So make some messages just KERN_ERR and remove the useless message. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
If a handle runs out of space, we currently stop the kernel with a BUG in jbd2_journal_dirty_metadata(). This makes it hard to figure out what might be going on. So return an error of ENOSPC, so we can let the file system layer figure out what is going on, to make it more likely we can get useful debugging information). This should make it easier to debug problems such as the one which was reported by: https://bugzilla.kernel.org/show_bug.cgi?id=44731 The only two callers of this function are ext4_handle_dirty_metadata() and ocfs2_journal_dirty(). The ocfs2 function will trigger a BUG_ON(), which means there will be no change in behavior. The ext4 function will call ext4_error_inode() which will print the useful debugging information and then handle the situation using ext4's error handling mechanisms (i.e., which might mean halting the kernel or remounting the file system read-only). Also, since both file systems already call WARN_ON(), drop the WARN_ON from jbd2_journal_dirty_metadata() to avoid two stack traces from being displayed. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: ocfs2-devel@oss.oracle.com Acked-by: NJoel Becker <jlbec@evilplan.org>
-
- 29 8月, 2013 1 次提交
-
-
由 Darrick J. Wong 提交于
In the jbd2 checksumming code, explicitly declare separate variables with endianness information so that we don't get confused and screw things up again. Also fixes sparse warnings. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 01 7月, 2013 3 次提交
-
-
由 Theodore Ts'o 提交于
If jbd2_journal_restart() fails the handle will have been disconnected from the current transaction. In this situation, the handle must not be used for for any jbd2 function other than jbd2_journal_stop(). Enforce this with by treating a handle which has a NULL transaction pointer as an aborted handle, and issue a kernel warning if jbd2_journal_extent(), jbd2_journal_get_write_access(), jbd2_journal_dirty_metadata(), etc. is called with an invalid handle. This commit also fixes a bug where jbd2_journal_stop() would trip over a kernel jbd2 assertion check when trying to free an invalid handle. Also move the responsibility of setting current->journal_info to start_this_handle(), simplifying the three users of this function. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reported-by: NYounger Liu <younger.liu@huawei.com> Cc: Jan Kara <jack@suse.cz>
-
由 Theodore Ts'o 提交于
Once we decrement transaction->t_updates, if this is the last handle holding the transaction from closing, and once we release the t_handle_lock spinlock, it's possible for the transaction to commit and be released. In practice with normal kernels, this probably won't happen, since the commit happens in a separate kernel thread and it's unlikely this could all happen within the space of a few CPU cycles. On the other hand, with a real-time kernel, this could potentially happen, so save the tid found in transaction->t_tid before we release t_handle_lock. It would require an insane configuration, such as one where the jbd2 thread was set to a very high real-time priority, perhaps because a high priority real-time thread is trying to read or write to a file system. But some people who use real-time kernels have been known to do insane things, including controlling laser-wielding industrial robots. :-) Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
由 Theodore Ts'o 提交于
Some of the functions which modify the jbd2 superblock were not updating the checksum before calling jbd2_write_superblock(). Move the call to jbd2_superblock_csum_set() to jbd2_write_superblock(), so that the checksum is calculated consistently. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: stable@vger.kernel.org
-
- 13 6月, 2013 6 次提交
-
-
由 Paul Gortmaker 提交于
Commit b6e96d00 ("jbd2: use module parameters instead of debugfs for jbd_debug") removed any need for a dependency on DEBUG_FS. It also moved the /sys variables out from underneath the typical debugfs mount point. Delete the dependency and update the /sys path to where the debug settings are currently. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Paul Gortmaker 提交于
Since the jbd_debug() is implemented with two separate printk() calls, it can lead to corrupted and misleading debug output like the following (see lines marked with "*"): [ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes [ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104 [ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ [* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit: [* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103 [* 290.339382] JBD2: starting commit of transaction 42104 [ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records [ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079 i.e. the debug output from log_wait_commit and journal_commit_transaction have become interleaved. The output should have been: (fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103 (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104 It is expected that this is not easy to replicate -- I was only able to cause it on preempt-rt kernels, and even then only under heavy I/O load. Reported-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Suggested-by: N"Theodore Ts'o" <tytso@mit.edu> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Paul Gortmaker 提交于
Currently we see this output: $git grep phase fs/jbd2 fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 1\n"); fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n"); fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n"); fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 3\n"); fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 4\n"); [...] There is clearly a duplicate label for phase 2, and they are both active (i.e. not in #if ... #else block). Rename them to be "2a" and "2b" so the debug output is unambiguous. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Paul Gortmaker 提交于
While trying to debug an an issue under extreme I/O loading on preempt-rt kernels, the following backtrace was observed via SysRQ output: rm D ffff8802203afbc0 4600 4878 4748 0x00000000 ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80 ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8 ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000 Call Trace: [<ffffffff8172dc34>] schedule+0x24/0x70 [<ffffffff81225b5d>] jbd2_log_wait_commit+0xbd/0x140 [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50 [<ffffffff81223635>] jbd2_log_do_checkpoint+0xf5/0x520 [<ffffffff81223b09>] __jbd2_log_wait_for_space+0xa9/0x1f0 [<ffffffff8121dc40>] start_this_handle.isra.10+0x2e0/0x530 [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50 [<ffffffff8121e0a3>] jbd2__journal_start+0xc3/0x110 [<ffffffff811de7ce>] ? ext4_rmdir+0x6e/0x230 [<ffffffff8121e0fe>] jbd2_journal_start+0xe/0x10 [<ffffffff811f308b>] ext4_journal_start_sb+0x5b/0x160 [<ffffffff811de7ce>] ext4_rmdir+0x6e/0x230 [<ffffffff811435c5>] vfs_rmdir+0xd5/0x140 [<ffffffff8114370f>] do_rmdir+0xdf/0x120 [<ffffffff8105c6b4>] ? task_work_run+0x44/0x80 [<ffffffff81002889>] ? do_notify_resume+0x89/0x100 [<ffffffff817361ae>] ? int_signal+0x12/0x17 [<ffffffff81145d85>] sys_unlinkat+0x25/0x40 [<ffffffff81735f22>] system_call_fastpath+0x16/0x1b What is interesting here, is that we call log_wait_commit, from within wait_for_space, but we are still holding the checkpoint_mutex as it surrounds mostly the whole of wait_for_space. And then, as we are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED bit is set, then we will also try to take the same checkpoint_mutex. It seems that we need to drop the checkpoint_mutex while sitting in jbd2_log_wait_commit, if we want to guarantee that progress can be made by jbd2_journal_commit_transaction(). There does not seem to be anything preempt-rt specific about this, other then perhaps increasing the odds of it happening. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Paul Gortmaker 提交于
The state lock is taken after we are doing an assert on the state value, not before. So we might in fact be doing an assert on a transient value. Ensure the state check is within the scope of the state lock being taken. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
Current implementation of jbd2_journal_force_commit() is suboptimal because result in empty and useless commits. But callers just want to force and wait any unfinished commits. We already have jbd2_journal_force_commit_nested() which does exactly what we want, except we are guaranteed that we do not hold journal transaction open. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 05 6月, 2013 8 次提交
-
-
由 Jan Kara 提交于
In some cases we cannot start a transaction because of locking constraints and passing started transaction into those places is not handy either because we could block transaction commit for too long. Transaction reservation is designed to solve these issues. It reserves a handle with given number of credits in the journal and the handle can be later attached to the running transaction without blocking on commit or checkpointing. Reserved handles do not block transaction commit in any way, they only reduce maximum size of the running transaction (because we have to always be prepared to accomodate request for attaching reserved handle). Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
j_wait_logspace and j_wait_checkpoint are unused. Remove them. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
jbd2_journal_extend() first checked whether transaction can accept extending handle with more credits and then added credits to t_outstanding_credits. This can race with start_this_handle() adding another handle to a transaction and thus overbooking a transaction. Make jbd2_journal_extend() use atomic_add_return() to close the race. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
__jbd2_log_space_left() and jbd_space_needed() were kind of odd. jbd_space_needed() accounted also credits needed for currently committing transaction while it didn't account for credits needed for control blocks. __jbd2_log_space_left() then accounted for control blocks as a fraction of free space. Since results of these two functions are always only compared against each other, this works correct but is somewhat strange. Move the estimates so that jbd_space_needed() returns number of blocks needed for a transaction including control blocks and __jbd2_log_space_left() returns free space in the journal (with the committing transaction already subtracted). Rename functions to jbd2_log_space_left() and jbd2_space_needed() while we are changing them. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
The comment about credit estimates isn't true anymore. We do what the comment describes now. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Currently when we add a buffer to a transaction, we wait until the buffer is removed from BJ_Shadow list (so that we prevent any changes to the buffer that is just written to the journal). This can take unnecessarily long as a lot happens between the time the buffer is submitted to the journal and the time when we remove the buffer from BJ_Shadow list. (e.g. We wait for all data buffers in the transaction, we issue a cache flush, etc.) Also this creates a dependency of do_get_write_access() on transaction commit (namely waiting for data IO to complete) which we want to avoid when implementing transaction reservation. So we modify commit code to set new BH_Shadow flag when temporary shadowing buffer is created and we clear that flag once IO on that buffer is complete. This allows do_get_write_access() to wait only for BH_Shadow bit and thus removes the dependency on data IO completion. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
Similarly as for metadata buffers, also log descriptor buffers don't really need the journal head. So strip it and remove BJ_LogCtl list. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
When writing metadata to the journal, we create temporary buffer heads for that task. We also attach journal heads to these buffer heads but the only purpose of the journal heads is to keep buffers linked in transaction's BJ_IO list. We remove the need for journal heads by reusing buffer_head's b_assoc_buffers list for that purpose. Also since BJ_IO list is just a temporary list for transaction commit, we use a private list in jbd2_journal_commit_transaction() for that thus removing BJ_IO list from transaction completely. Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 28 5月, 2013 2 次提交
-
-
由 Darrick J. Wong 提交于
Al Viro complained of a ton of bogosity with regards to the jbd2 block tag header checksum. This one checksum is 16 bits, so cut off the upper 16 bits and treat it as a 16-bit value and don't mess around with be32* conversions. Fortunately metadata checksumming is still "experimental" and not in a shipping e2fsprogs, so there should be few users affected by this. Reported-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Zheng Liu 提交于
This commit tries to use kmem_cache_zalloc instead of kmem_cache_alloc/ memset when a new journal head is alloctated. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Cc: "Theodore Ts'o" <tytso@mit.edu>
-
- 22 5月, 2013 1 次提交
-
-
由 Lukas Czerner 提交于
invalidatepage now accepts range to invalidate and there are two file system using jbd2 also implementing punch hole feature which can benefit from this. We need to implement the same thing for jbd2 layer in order to allow those file system take benefit of this functionality. This commit adds length argument to the jbd2_journal_invalidatepage() and updates all instances in ext4 and ocfs2. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 30 4月, 2013 1 次提交
-
-
由 majianpeng 提交于
bh allocation uses kmem_cache_zalloc() so we needn't call 'init_buffer(bh, NULL, NULL)' and perform other set-zero-operations. Signed-off-by: NJianpeng Ma <majianpeng@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 4月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
While investigating interactivity problems it was clear that processes sometimes stall for long periods of times if an attempt is made to lock a buffer which is undergoing writeback. It would stall in a trace looking something like [<ffffffff811a39de>] __lock_buffer+0x2e/0x30 [<ffffffff8123a60f>] do_get_write_access+0x43f/0x4b0 [<ffffffff8123a7cb>] jbd2_journal_get_write_access+0x2b/0x50 [<ffffffff81220f79>] __ext4_journal_get_write_access+0x39/0x80 [<ffffffff811f3198>] ext4_reserve_inode_write+0x78/0xa0 [<ffffffff811f3209>] ext4_mark_inode_dirty+0x49/0x220 [<ffffffff811f57d1>] ext4_dirty_inode+0x41/0x60 [<ffffffff8119ac3e>] __mark_inode_dirty+0x4e/0x2d0 [<ffffffff8118b9b9>] update_time+0x79/0xc0 [<ffffffff8118ba98>] file_update_time+0x98/0x100 [<ffffffff81110ffc>] __generic_file_aio_write+0x17c/0x3b0 [<ffffffff811112aa>] generic_file_aio_write+0x7a/0xf0 [<ffffffff811ea853>] ext4_file_write+0x83/0xd0 [<ffffffff81172b23>] do_sync_write+0xa3/0xe0 [<ffffffff811731ae>] vfs_write+0xae/0x180 [<ffffffff8117361d>] sys_write+0x4d/0x90 [<ffffffff8159d62d>] system_call_fastpath+0x1a/0x1f [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: NMel Gorman <mgorman@suse.de> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 20 4月, 2013 1 次提交
-
-
由 Zheng Liu 提交于
The jbd2_alloc_handle() function is only called by new_handle(). So this commit uses kmem_cache_zalloc() instead of kmem_cache_alloc()/memset(). Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 10 4月, 2013 1 次提交
-
-
由 Al Viro 提交于
The only part of proc_dir_entry the code outside of fs/proc really cares about is PDE(inode)->data. Provide a helper for that; static inline for now, eventually will be moved to fs/proc, along with the knowledge of struct proc_dir_entry layout. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 04 4月, 2013 2 次提交
-
-
由 Dmitry Monakhov 提交于
The following race is possible: [kjournald2] other_task jbd2_journal_commit_transaction() j_state = T_FINISHED; spin_unlock(&journal->j_list_lock); ->jbd2_journal_remove_checkpoint() ->jbd2_journal_free_transaction(); ->kmem_cache_free(transaction) ->j_commit_callback(journal, transaction); -> USE_AFTER_FREE WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250() Hardware name: list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod Pid: 16400, comm: jbd2/dm-1-8 Tainted: G W 3.8.0-rc3+ #107 Call Trace: [<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0 [<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50 [<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0 [<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250 [<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0 [<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570 [<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0 [<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0 [<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0 [<ffffffff810ad630>] ? wake_up_bit+0x40/0x40 [<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80 [<ffffffff810ac6be>] kthread+0x10e/0x120 [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70 [<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0 [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70 In order to demonstrace this issue one should mount ext4 with mount -o discard option on SSD disk. This makes callback longer and race window becomes wider. In order to fix this we should mark transaction as finished only after callbacks have completed Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
由 Theodore Ts'o 提交于
In the case where an inode has a very stale transaction id (tid) in i_datasync_tid or i_sync_tid, it's possible that after a very large (2**31) number of transactions, that the tid number space might wrap, causing tid_geq()'s calculations to fail. Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily", attempted to fix this problem, but it only avoided kjournald spinning forever by fixing the logic in jbd2_log_start_commit(). Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c that might call jbd2_log_start_commit() with a stale tid, those functions will subsequently call jbd2_log_wait_commit() with the same stale tid, and then wait for a very long time. To fix this, we replace the calls to jbd2_log_start_commit() and jbd2_log_wait_commit() with a call to a new function, jbd2_complete_transaction(), which will correctly handle stale tid's. As a bonus, jbd2_complete_transaction() will avoid locking j_state_lock for writing unless a commit needs to be started. This should have a small (but probably not measurable) improvement for ext4's scalability. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reported-by: NBen Hutchings <ben@decadent.org.uk> Reported-by: NGeorge Barnett <gbarnett@atlassian.com> Cc: stable@vger.kernel.org
-
- 12 3月, 2013 1 次提交
-
-
由 Jan Kara 提交于
jbd2_journal_dirty_metadata() didn't get a reference to journal_head it was working with. This is OK in most of the cases since the journal head should be attached to a transaction but in rare occasions when we are journalling data, __ext4_journalled_writepage() can race with jbd2_journal_invalidatepage() stripping buffers from a page and thus journal head can be freed under hands of jbd2_journal_dirty_metadata(). Fix the problem by getting own journal head reference in jbd2_journal_dirty_metadata() (and also in jbd2_journal_set_triggers() which can possibly have the same issue). Reported-by: NZheng Liu <gnehzuil.liu@gmail.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 03 3月, 2013 1 次提交
-
-
由 Dmitry Monakhov 提交于
If start_this_handle() failed handle will be initialized to ERR_PTR() and can not be dereferenced. paging request at fffffffffffffff6 IP: [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290 PGD 200e067 PUD 200f067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod CPU 0 journal commit I/O error Pid: 2694, comm: fio Not tainted 3.8.0-rc3+ #79 /DQ67SW RIP: 0010:[<ffffffff813c073f>] [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290 RSP: 0018:ffff880233b8ba58 EFLAGS: 00010292 RAX: 00000000ffffffe2 RBX: ffffffffffffffe2 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff82128f48 RBP: ffff880233b8ba98 R08: 0000000000000000 R09: ffff88021440a6e0 Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 10 2月, 2013 1 次提交
-
-
由 Theodore Ts'o 提交于
There are multiple reasons to move away from debugfs. First of all, we are only using it for a single parameter, and it is much more complicated to set up (some 30 lines of code compared to 3), and one more thing that might fail while loading the jbd2 module. Secondly, as a module paramter it can be specified as a boot option if jbd2 is built into the kernel, or as a parameter when the module is loaded, and it can also be manipulated dynamically under /sys/module/jbd2/parameters/jbd2_debug. So it is more flexible. Ultimately we want to move away from using jbd_debug() towards tracepoints, but for now this is still a useful simplification of the code base. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-