提交 · 3e3bd3582dca2305ed2d1bd562efc042c0cc77e7 · openanolis / cloud-kernel

02 1月, 2020 1 次提交

alinux: jbd2: create jbd2-ckpt thread for journal checkpoint · 3e3bd358

由 Joseph Qi 提交于 3月 07, 2018

This is trying to do jbd2 checkpoint in a specific kernel thread, then
checkpoint won't be under io throttle control.
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Reviewed by: Baoyou Xie <baoyou.xie@linux.alibaba.com>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

3e3bd358

27 12月, 2019 1 次提交

jbd2: fix deadlock while checkpoint thread waits commit thread to finish · 1b483bfe

由 Xiaoguang Wang 提交于 1月 31, 2019

commit 53cf978457325d8fb2cdecd7981b31a8229e446e upstream.

This issue was found when I tried to put checkpoint work in a separate thread,
the deadlock below happened:
         Thread1                                |   Thread2
__jbd2_log_wait_for_space                       |
jbd2_log_do_checkpoint (hold j_checkpoint_mutex)|
  if (jh->b_transaction != NULL)                |
    ...                                         |
    jbd2_log_start_commit(journal, tid);        |jbd2_update_log_tail
                                                |  will lock j_checkpoint_mutex,
                                                |  but will be blocked here.
                                                |
    jbd2_log_wait_commit(journal, tid);         |
    wait_event(journal->j_wait_done_commit,     |
     !tid_gt(tid, journal->j_commit_sequence)); |
     ...                                        |wake_up(j_wait_done_commit)
  }                                             |

then deadlock occurs, Thread1 will never be waken up.

To fix this issue, drop j_checkpoint_mutex in jbd2_log_do_checkpoint()
when we are going to wait for transaction commit.
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

1b483bfe

14 11月, 2018 1 次提交

jbd2: fix use after free in jbd2_log_do_checkpoint() · 25881163

由 Jan Kara 提交于 10月 05, 2018

commit ccd3c4373eacb044eb3832966299d13d2631f66f upstream.

The code cleaning transaction's lists of checkpoint buffers has a bug
where it increases bh refcount only after releasing
journal->j_list_lock. Thus the following race is possible:

CPU0					CPU1
jbd2_log_do_checkpoint()
					jbd2_journal_try_to_free_buffers()
					  __journal_try_to_free_buffer(bh)
  ...
  while (transaction->t_checkpoint_io_list)
  ...
    if (buffer_locked(bh)) {

<-- IO completes now, buffer gets unlocked -->

      spin_unlock(&journal->j_list_lock);
					    spin_lock(&journal->j_list_lock);
					    __jbd2_journal_remove_checkpoint(jh);
					    spin_unlock(&journal->j_list_lock);
					  try_to_free_buffers(page);
      get_bh(bh) <-- accesses freed bh

Fix the problem by grabbing bh reference before unlocking
journal->j_list_lock.

Fixes: dc6e8d66 ("jbd2: don't call get_bh() before calling __jbd2_journal_remove_checkpoint()")
Fixes: be1158cc ("jbd2: fold __process_buffer() into jbd2_log_do_checkpoint()")
Reported-by: syzbot+7f4a27091759e2fe7453@syzkaller.appspotmail.com
CC: stable@vger.kernel.org
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

25881163

18 12月, 2017 1 次提交

ext4: fix up remaining files with SPDX cleanups · f5166768

由 Theodore Ts'o 提交于 12月 17, 2017

A number of ext4 source files were skipped due because their copyright
permission statements didn't match the expected text used by the
automated conversion utilities.  I've added SPDX tags for the rest.

While looking at some of these files, I've noticed that we have quite
a bit of variation on the licenses that were used --- in particular
some of the Red Hat licenses on the jbd2 files use a GPL2+ license,
and we have some files that have a LGPL-2.1 license (which was quite
surprising).

I've not attempted to do any license changes.  Even if it is perfectly
legal to relicense to GPL 2.0-only for consistency's sake, that should
be done with ext4 developer community discussion.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

f5166768

01 11月, 2016 1 次提交

block,fs: use REQ_* flags directly · 70fd7614

由 Christoph Hellwig 提交于 11月 01, 2016

Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly.  Where applicable this also drops usage of the
bio_set_op_attrs wrapper.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

70fd7614

18 10月, 2015 1 次提交

jbd2: fix checkpoint list cleanup · 33d14975

由 Jan Kara 提交于 10月 17, 2015

Unlike comments and expectation of callers journal_clean_one_cp_list()
returned 1 not only if it freed the transaction but also if it freed
some buffers in the transaction. That could make
__jbd2_journal_clean_checkpoint_list() skip processing
t_checkpoint_io_list and continue with processing the next transaction.
This is mostly a cosmetic issue since the only result is we can
sometimes free less memory than we could. But it's still worth fixing.
Fix journal_clean_one_cp_list() to return 1 only if the transaction was
really freed.

Fixes: 50849db3Signed-off-by: NJan Kara <jack@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

33d14975

29 7月, 2015 1 次提交

jbd2: avoid infinite loop when destroying aborted journal · 841df7df

由 Jan Kara 提交于 7月 28, 2015

Commit 6f6a6fda "jbd2: fix ocfs2 corrupt when updating journal
superblock fails" changed jbd2_cleanup_journal_tail() to return EIO
when the journal is aborted. That makes logic in
jbd2_log_do_checkpoint() bail out which is fine, except that
jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make
a progress in cleaning the journal. Without it jbd2_journal_destroy()
just loops in an infinite loop.

Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of
jbd2_log_do_checkpoint() fails with error.
Reported-by: NEryu Guan <guaneryu@gmail.com>
Tested-by: NEryu Guan <guaneryu@gmail.com>
Fixes: 6f6a6fdaSigned-off-by: NJan Kara <jack@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

841df7df

16 6月, 2015 1 次提交

jbd2: fix ocfs2 corrupt when updating journal superblock fails · 6f6a6fda

由 Joseph Qi 提交于 6月 15, 2015

If updating journal superblock fails after journal data has been
flushed, the error is omitted and this will mislead the caller as a
normal case.  In ocfs2, the checkpoint will be treated successfully
and the other node can get the lock to update. Since the sb_start is
still pointing to the old log block, it will rewrite the journal data
during journal recovery by the other node. Thus the new updates will
be overwritten and ocfs2 corrupts.  So in above case we have to return
the error, and ocfs2_commit_cache will take care of the error and
prevent the other node to do update first.  And only after recovering
journal it can do the new updates.

The issue discussion mail can be found at:
https://oss.oracle.com/pipermail/ocfs2-devel/2015-June/010856.html
http://comments.gmane.org/gmane.comp.file-systems.ext4/48841

[ Fixed bug in patch which allowed a non-negative error return from
  jbd2_cleanup_journal_tail() to leak out of jbd2_fjournal_flush(); this
  was causing xfstests ext4/306 to fail. -- Ted ]
Reported-by: NYiwen Jiang <jiangyiwen@huawei.com>
Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Tested-by: NYiwen Jiang <jiangyiwen@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: stable@vger.kernel.org

6f6a6fda

15 6月, 2015 1 次提交

jbd2: use GFP_NOFS in jbd2_cleanup_journal_tail() · b4f1afcd

由 Dmitry Monakhov 提交于 6月 15, 2015

jbd2_cleanup_journal_tail() can be invoked by jbd2__journal_start()
So allocations should be done with GFP_NOFS

[Full stack trace snipped from 3.10-rh7]
[<ffffffff815c4bd4>] dump_stack+0x19/0x1b
[<ffffffff8105dba1>] warn_slowpath_common+0x61/0x80
[<ffffffff8105dcca>] warn_slowpath_null+0x1a/0x20
[<ffffffff815c2142>] slab_pre_alloc_hook.isra.31.part.32+0x15/0x17
[<ffffffff8119c045>] kmem_cache_alloc+0x55/0x210
[<ffffffff811477f5>] ? mempool_alloc_slab+0x15/0x20
[<ffffffff811477f5>] mempool_alloc_slab+0x15/0x20
[<ffffffff81147939>] mempool_alloc+0x69/0x170
[<ffffffff815cb69e>] ? _raw_spin_unlock_irq+0xe/0x20
[<ffffffff8109160d>] ? finish_task_switch+0x5d/0x150
[<ffffffff811f1a8e>] bio_alloc_bioset+0x1be/0x2e0
[<ffffffff8127ee49>] blkdev_issue_flush+0x99/0x120
[<ffffffffa019a733>] jbd2_cleanup_journal_tail+0x93/0xa0 [jbd2] -->GFP_KERNEL
[<ffffffffa019aca1>] jbd2_log_do_checkpoint+0x221/0x4a0 [jbd2]
[<ffffffffa019afc7>] __jbd2_log_wait_for_space+0xa7/0x1e0 [jbd2]
[<ffffffffa01952d8>] start_this_handle+0x2d8/0x550 [jbd2]
[<ffffffff811b02a9>] ? __memcg_kmem_put_cache+0x29/0x30
[<ffffffff8119c120>] ? kmem_cache_alloc+0x130/0x210
[<ffffffffa019573a>] jbd2__journal_start+0xba/0x190 [jbd2]
[<ffffffff811532ce>] ? lru_cache_add+0xe/0x10
[<ffffffffa01c9549>] ? ext4_da_write_begin+0xf9/0x330 [ext4]
[<ffffffffa01f2c77>] __ext4_journal_start_sb+0x77/0x160 [ext4]
[<ffffffffa01c9549>] ext4_da_write_begin+0xf9/0x330 [ext4]
[<ffffffff811446ec>] generic_file_buffered_write_iter+0x10c/0x270
[<ffffffff81146918>] __generic_file_write_iter+0x178/0x390
[<ffffffff81146c6b>] __generic_file_aio_write+0x8b/0xb0
[<ffffffff81146ced>] generic_file_aio_write+0x5d/0xc0
[<ffffffffa01bf289>] ext4_file_write+0xa9/0x450 [ext4]
[<ffffffff811c31d9>] ? pipe_read+0x379/0x4f0
[<ffffffff811b93f0>] do_sync_write+0x90/0xe0
[<ffffffff811b9b6d>] vfs_write+0xbd/0x1e0
[<ffffffff811ba5b8>] SyS_write+0x58/0xb0
[<ffffffff815d4799>] system_call_fastpath+0x16/0x1b
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

b4f1afcd

18 9月, 2014 2 次提交

jbd2: simplify calling convention around __jbd2_journal_clean_checkpoint_list · 50849db3

由 Jan Kara 提交于 9月 18, 2014

__jbd2_journal_clean_checkpoint_list() returns number of buffers it
freed but noone was using the value so just stop doing that. This
also allows for simplifying the calling convention for
journal_clean_once_cp_list().
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

50849db3

jbd2: avoid pointless scanning of checkpoint lists · cc97f1a7

由 Jan Kara 提交于 9月 18, 2014

Yuanhan has reported that when he is running fsync(2) heavy workload
creating new files over ramdisk, significant amount of time is spent in
__jbd2_journal_clean_checkpoint_list() trying to clean old transactions
(but they cannot be cleaned up because flusher hasn't yet checkpointed
those buffers). The workload can be generated by:
  fs_mark -d /fs/ram0/1 -D 2 -N 2560 -n 1000000 -L 1 -S 1 -s 4096

Reduce the amount of scanning by stopping to scan the transaction list
once we find a transaction that cannot be checkpointed. Note that this
way of cleaning is still enough to keep freeing space in the journal
after fully checkpointed transactions.
Reported-and-tested-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

cc97f1a7

17 9月, 2014 1 次提交

jbd2: jbd2_log_wait_for_space improve error detetcion · 1245799f

由 Dmitry Monakhov 提交于 9月 16, 2014

If EIO happens after we have dropped j_state_lock, we won't notice
that the journal has been aborted.  So it is reasonable to move this
check after we have grabbed the j_checkpoint_mutex and re-grabbed the
j_state_lock.  This patch helps to prevent false positive complain
after EIO.

#DMESG:
__jbd2_log_wait_for_space: needed 8448 blocks and only had 8386 space available
__jbd2_log_wait_for_space: no way to get more journal space in ram1-8
------------[ cut here ]------------
WARNING: CPU: 15 PID: 6739 at fs/jbd2/checkpoint.c:168 __jbd2_log_wait_for_space+0x188/0x200()
Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
CPU: 15 PID: 6739 Comm: fsstress Tainted: G        W      3.17.0-rc2-00429-g684de574 #139
Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
 00000000000000a8 ffff88077aaab878 ffffffff815c1a8c 00000000000000a8
 0000000000000000 ffff88077aaab8b8 ffffffff8106ce8c ffff88077aaab898
 ffff8807c57e6000 ffff8807c57e6028 0000000000002100 ffff8807c57e62f0
Call Trace:
 [<ffffffff815c1a8c>] dump_stack+0x51/0x6d
 [<ffffffff8106ce8c>] warn_slowpath_common+0x8c/0xc0
 [<ffffffff8106ceda>] warn_slowpath_null+0x1a/0x20
 [<ffffffff812419f8>] __jbd2_log_wait_for_space+0x188/0x200
 [<ffffffff8123be9a>] start_this_handle+0x4da/0x7b0
 [<ffffffff810990e5>] ? local_clock+0x25/0x30
 [<ffffffff810aba87>] ? lockdep_init_map+0xe7/0x180
 [<ffffffff8123c5bc>] jbd2__journal_start+0xdc/0x1d0
 [<ffffffff811f2414>] ? __ext4_new_inode+0x7f4/0x1330
 [<ffffffff81222a38>] __ext4_journal_start_sb+0xf8/0x110
 [<ffffffff811f2414>] __ext4_new_inode+0x7f4/0x1330
 [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
 [<ffffffff812025bb>] ext4_create+0x8b/0x150
 [<ffffffff8117fe3b>] vfs_create+0x7b/0xb0
 [<ffffffff8118097b>] do_last+0x7db/0xcf0
 [<ffffffff8117e31d>] ? inode_permission+0x4d/0x50
 [<ffffffff811845d2>] path_openat+0x242/0x590
 [<ffffffff81191a76>] ? __alloc_fd+0x36/0x140
 [<ffffffff81184a6a>] do_filp_open+0x4a/0xb0
 [<ffffffff81191b61>] ? __alloc_fd+0x121/0x140
 [<ffffffff81172f20>] do_sys_open+0x170/0x220
 [<ffffffff8117300e>] SyS_open+0x1e/0x20
 [<ffffffff811715d6>] SyS_creat+0x16/0x20
 [<ffffffff815c7e12>] system_call_fastpath+0x16/0x1b
---[ end trace cd71c831f82059db ]---
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

1245799f

05 9月, 2014 2 次提交

jbd2: optimize jbd2_log_do_checkpoint() a bit · 0e5ecf0a

由 Jan Kara 提交于 9月 04, 2014

When we discover written out buffer in transaction checkpoint list we
don't have to recheck validity of a transaction. Either this is the
last buffer in a transaction - and then we are done - or this isn't
and then we can just take another buffer from the checkpoint list
without dropping j_list_lock.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

0e5ecf0a

jbd2: don't call get_bh() before calling __jbd2_journal_remove_checkpoint() · dc6e8d66

由 Theodore Ts'o 提交于 9月 04, 2014

The __jbd2_journal_remove_checkpoint() doesn't require an elevated
b_count; indeed, until the jh structure gets released by the call to
jbd2_journal_put_journal_head(), the bh's b_count is elevated by
virtue of the existence of the jh structure.
Suggested-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

dc6e8d66

02 9月, 2014 2 次提交

jbd2: fold __wait_cp_io into jbd2_log_do_checkpoint() · 88fe1acb

由 Theodore Ts'o 提交于 9月 01, 2014

__wait_cp_io() is only called by jbd2_log_do_checkpoint().  Fold it in
to make it a bit easier to understand.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

88fe1acb

jbd2: fold __process_buffer() into jbd2_log_do_checkpoint() · be1158cc

由 Theodore Ts'o 提交于 9月 01, 2014

__process_buffer() is only called by jbd2_log_do_checkpoint(), and it
had a very complex locking protocol where it would be called with the
j_list_lock, and sometimes exit with the lock held (if the return code
was 0), or release the lock.

This was confusing both to humans and to smatch (which erronously
complained that the lock was taken twice).

Folding __process_buffer() to the caller allows us to simplify the
control flow, making the resulting function easier to read and reason
about, and dropping the compiled size of fs/jbd2/checkpoint.c by 150
bytes (over 4% of the text size).
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

be1158cc

13 6月, 2013 1 次提交

jbd2: drop checkpoint mutex when waiting in __jbd2_log_wait_for_space() · 0ef54180

由 Paul Gortmaker 提交于 6月 12, 2013

While trying to debug an an issue under extreme I/O loading
on preempt-rt kernels, the following backtrace was observed
via SysRQ output:

rm              D ffff8802203afbc0  4600  4878   4748 0x00000000
 ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80
 ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8
 ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000
Call Trace:
 [<ffffffff8172dc34>] schedule+0x24/0x70
 [<ffffffff81225b5d>] jbd2_log_wait_commit+0xbd/0x140
 [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
 [<ffffffff81223635>] jbd2_log_do_checkpoint+0xf5/0x520
 [<ffffffff81223b09>] __jbd2_log_wait_for_space+0xa9/0x1f0
 [<ffffffff8121dc40>] start_this_handle.isra.10+0x2e0/0x530
 [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
 [<ffffffff8121e0a3>] jbd2__journal_start+0xc3/0x110
 [<ffffffff811de7ce>] ? ext4_rmdir+0x6e/0x230
 [<ffffffff8121e0fe>] jbd2_journal_start+0xe/0x10
 [<ffffffff811f308b>] ext4_journal_start_sb+0x5b/0x160
 [<ffffffff811de7ce>] ext4_rmdir+0x6e/0x230
 [<ffffffff811435c5>] vfs_rmdir+0xd5/0x140
 [<ffffffff8114370f>] do_rmdir+0xdf/0x120
 [<ffffffff8105c6b4>] ? task_work_run+0x44/0x80
 [<ffffffff81002889>] ? do_notify_resume+0x89/0x100
 [<ffffffff817361ae>] ? int_signal+0x12/0x17
 [<ffffffff81145d85>] sys_unlinkat+0x25/0x40
 [<ffffffff81735f22>] system_call_fastpath+0x16/0x1b

What is interesting here, is that we call log_wait_commit, from
within wait_for_space, but we are still holding the checkpoint_mutex
as it surrounds mostly the whole of wait_for_space.  And then, as we
are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED
bit is set, then we will also try to take the same checkpoint_mutex.

It seems that we need to drop the checkpoint_mutex while sitting in
jbd2_log_wait_commit, if we want to guarantee that progress can be made
by jbd2_journal_commit_transaction().  There does not seem to be
anything preempt-rt specific about this, other then perhaps increasing
the odds of it happening.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0ef54180

05 6月, 2013 4 次提交

jbd2: remove unused waitqueues · f29fad72

由 Jan Kara 提交于 6月 04, 2013

j_wait_logspace and j_wait_checkpoint are unused.  Remove them.
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f29fad72

jbd2: cleanup needed free block estimates when starting a transaction · 76c39904

由 Jan Kara 提交于 6月 04, 2013

__jbd2_log_space_left() and jbd_space_needed() were kind of odd.
jbd_space_needed() accounted also credits needed for currently
committing transaction while it didn't account for credits needed for
control blocks.  __jbd2_log_space_left() then accounted for control
blocks as a fraction of free space.  Since results of these two
functions are always only compared against each other, this works
correct but is somewhat strange.  Move the estimates so that
jbd_space_needed() returns number of blocks needed for a transaction
including control blocks and __jbd2_log_space_left() returns free
space in the journal (with the committing transaction already
subtracted).  Rename functions to jbd2_log_space_left() and
jbd2_space_needed() while we are changing them.
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

76c39904

jbd2: remove journal_head from descriptor buffers · e5a120ae

由 Jan Kara 提交于 6月 04, 2013

Similarly as for metadata buffers, also log descriptor buffers don't
really need the journal head. So strip it and remove BJ_LogCtl list.
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e5a120ae

jbd2: don't create journal_head for temporary journal buffers · f5113eff

由 Jan Kara 提交于 6月 04, 2013

When writing metadata to the journal, we create temporary buffer heads
for that task.  We also attach journal heads to these buffer heads but
the only purpose of the journal heads is to keep buffers linked in
transaction's BJ_IO list.  We remove the need for journal heads by
reusing buffer_head's b_assoc_buffers list for that purpose.  Also
since BJ_IO list is just a temporary list for transaction commit, we
use a private list in jbd2_journal_commit_transaction() for that thus
removing BJ_IO list from transaction completely.
Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f5113eff

14 3月, 2012 4 次提交

jbd2: remove bh_state lock from checkpointing code · 932bb305

由 Jan Kara 提交于 3月 13, 2012

All accesses to checkpointing entries in journal_head are protected
by j_list_lock. Thus __jbd2_journal_remove_checkpoint() doesn't really
need bh_state lock.

Also the only part of journal head that the rest of checkpointing code
needs to check is jh->b_transaction which is safe to read under
j_list_lock.

So we can safely remove bh_state lock from all of checkpointing code which
makes it considerably prettier.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

932bb305

jbd2: fix BH_JWrite setting in checkpointing code · 96c86678

由 Jan Kara 提交于 3月 13, 2012

BH_JWrite bit should be set when buffer is written to the journal. So
checkpointing shouldn't set this bit when writing out buffer. This didn't
cause any observable bug since BH_JWrite bit is used only for debugging
purposes but it's good to have this consistent.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

96c86678

jbd2: issue cache flush after checkpointing even with internal journal · 79feb521

由 Jan Kara 提交于 3月 13, 2012

When we reach jbd2_cleanup_journal_tail(), there is no guarantee that
checkpointed buffers are on a stable storage - especially if buffers were
written out by jbd2_log_do_checkpoint(), they are likely to be only in disk's
caches. Thus when we update journal superblock effectively removing old
transaction from journal, this write of superblock can get to stable storage
before those checkpointed buffers which can result in filesystem corruption
after a crash. Thus we must unconditionally issue a cache flush before we
update journal superblock in these cases.

A similar problem can also occur if journal superblock is written only in
disk's caches, other transaction starts reusing space of the transaction
cleaned from the log and power failure happens. Subsequent journal replay would
still try to replay the old transaction but some of it's blocks may be already
overwritten by the new transaction. For this reason we must use WRITE_FUA when
updating log tail and we must first write new log tail to disk and update
in-memory information only after that.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

79feb521

jbd2: split updating of journal superblock and marking journal empty · 24bcc89c

由 Jan Kara 提交于 3月 13, 2012

There are three case of updating journal superblock. In the first case, we want
to mark journal as empty (setting s_sequence to 0), in the second case we want
to update log tail, in the third case we want to update s_errno. Split these
cases into separate functions. It makes the code slightly more straightforward
and later patches will make the distinction even more important.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

24bcc89c

21 2月, 2012 2 次提交

jbd2: allocate transaction from separate slab cache · 0c2022ec

由 Yongqiang Yang 提交于 2月 20, 2012

There is normally only a handful of these active at any one time, but
putting them in a separate slab cache makes debugging memory
corruption problems easier.  Manish Katiyar also wanted this make it
easier to test memory failure scenarios in the jbd2 layer.

Cc: Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0c2022ec

jbd2: add drop_transaction/update_superblock_end tracepoints · 2201c590

由 Seiji Aguchi 提交于 2月 20, 2012

This patch adds trace_jbd2_drop_transaction and
trace_jbd2_update_superblock_end because there are similar tracepoints
in jbd and they are needed in jbd2 as well.
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2201c590

06 12月, 2011 1 次提交
- P
  treewide: Fix comment and string typo 'bufer' · 90802ed9
  由 Paul Bolle 提交于 12月 05, 2011
```
Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
```
  90802ed9
28 6月, 2011 1 次提交

jbd2: use WRITE_SYNC in journal checkpoint · d3ad8434

由 Tao Ma 提交于 6月 27, 2011

In journal checkpoint, we write the buffer and wait for its finish.
But in cfq, the async queue has a very low priority, and in our test,
if there are too many sync queues and every queue is filled up with
requests, the write request will be delayed for quite a long time and
all the tasks which are waiting for journal space will end with errors like:

INFO: task attr_set:3816 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
attr_set      D ffff880028393480     0  3816      1 0x00000000
 ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
 ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
 ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
Call Trace:
 [<ffffffff8103e456>] ? __dequeue_entity+0x33/0x38
 [<ffffffff8103caad>] ? need_resched+0x23/0x2d
 [<ffffffff814006a6>] ? thread_return+0xa2/0xbc
 [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
 [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
 [<ffffffff81400d31>] __mutex_lock_common+0x14e/0x1a9
 [<ffffffffa021dbfb>] ? brelse+0x13/0x15 [ext4]
 [<ffffffff81400ddb>] __mutex_lock_slowpath+0x19/0x1b
 [<ffffffff81400b2d>] mutex_lock+0x1b/0x32
 [<ffffffffa01f927b>] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
 [<ffffffffa01f547b>] start_this_handle+0x438/0x527 [jbd2]
 [<ffffffff8106f491>] ? autoremove_wake_function+0x0/0x3e
 [<ffffffffa01f560b>] jbd2_journal_start+0xa1/0xcc [jbd2]
 [<ffffffffa02353be>] ext4_journal_start_sb+0x57/0x81 [ext4]
 [<ffffffffa024a314>] ext4_xattr_set+0x6c/0xe3 [ext4]
 [<ffffffffa024aaff>] ext4_xattr_user_set+0x42/0x4b [ext4]
 [<ffffffff81145adb>] generic_setxattr+0x6b/0x76
 [<ffffffff81146ac0>] __vfs_setxattr_noperm+0x47/0xc0
 [<ffffffff81146bb8>] vfs_setxattr+0x7f/0x9a
 [<ffffffff81146c88>] setxattr+0xb5/0xe8
 [<ffffffff81137467>] ? do_filp_open+0x571/0xa6e
 [<ffffffff81146d26>] sys_fsetxattr+0x6b/0x91
 [<ffffffff81002d32>] system_call_fastpath+0x16/0x1b

So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
be moved into sync queue and handled by cfq timely. We also use the new plug,
sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Reported-by: NRobin Dong <sanbai@taobao.com>

d3ad8434

14 6月, 2011 1 次提交

jbd2: Fix oops in jbd2_journal_remove_journal_head() · de1b7941

由 Jan Kara 提交于 6月 13, 2011

jbd2_journal_remove_journal_head() can oops when trying to access
journal_head returned by bh2jh(). This is caused for example by the
following race:

	TASK1					TASK2
  jbd2_journal_commit_transaction()
    ...
    processing t_forget list
      __jbd2_journal_refile_buffer(jh);
      if (!jh->b_transaction) {
        jbd_unlock_bh_state(bh);
					jbd2_journal_try_to_free_buffers()
					  jbd2_journal_grab_journal_head(bh)
					  jbd_lock_bh_state(bh)
					  __journal_try_to_free_buffer()
					  jbd2_journal_put_journal_head(jh)
        jbd2_journal_remove_journal_head(bh);

jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
buffer is not part of any transaction and thus frees journal_head
before TASK1 gets to doing so. Note that even buffer_head can be
released by try_to_free_buffers() after
jbd2_journal_put_journal_head() which adds even larger opportunity for
oops (but I didn't see this happen in reality).

Fix the problem by making transactions hold their own journal_head
reference (in b_jcount). That way we don't have to remove journal_head
explicitely via jbd2_journal_remove_journal_head() and instead just
remove journal_head when b_jcount drops to zero. The result of this is
that [__]jbd2_journal_refile_buffer(),
[__]jbd2_journal_unfile_buffer(), and
__jdb2_journal_remove_checkpoint() can free journal_head which needs
modification of a few callers. Also we have to be careful because once
journal_head is removed, buffer_head might be freed as well. So we
have to get our own buffer_head reference where it matters.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

de1b7941

28 10月, 2010 1 次提交

jbd2: Add sanity check for attempts to start handle during umount · 5c2178e7

由 Theodore Ts'o 提交于 10月 27, 2010

An attempt to modify the file system during the call to
jbd2_destroy_journal() can lead to a system lockup.  So add some
checking to make it much more obvious when this happens to and to
determine where the offending code is located.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5c2178e7

17 9月, 2010 1 次提交

block: remove BLKDEV_IFL_WAIT · dd3932ed

由 Christoph Hellwig 提交于 9月 16, 2010

All the blkdev_issue_* helpers can only sanely be used for synchronous
caller. To issue cache flushes or barriers asynchronously the caller needs
to set up a bio by itself with a completion callback to move the asynchronous
state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always
specified when calling blkdev_issue_* and also remove the now unused flags
argument to blkdev_issue_flush and blkdev_issue_zeroout. For
blkdev_issue_discard we need to keep it for the secure discard flag, which
gains a more descriptive name and loses the bitops vs flag confusion.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

dd3932ed

18 8月, 2010 1 次提交

remove SWRITE* I/O types · 9cb569d6

由 Christoph Hellwig 提交于 8月 11, 2010

These flags aren't real I/O types, but tell ll_rw_block to always
lock the buffer instead of giving up on a failed trylock.

Instead add a new write_dirty_buffer helper that implements this semantic
and use it from the existing SWRITE* callers.  Note that the ll_rw_block
code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
this patch fixes.

In the ufs code clean up the helper that used to call ll_rw_block
to mirror sync_dirty_buffer, which is the function it implements for
compound buffers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9cb569d6

04 8月, 2010 1 次提交

jbd2: Change j_state_lock to be a rwlock_t · a931da6a

由 Theodore Ts'o 提交于 8月 03, 2010

Lockstat reports have shown that j_state_lock is a major source of
lock contention, especially on systems with more than 4 CPU cores.  So
change it to be a read/write spinlock.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a931da6a

02 8月, 2010 1 次提交

jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop · a51dca9c

由 Theodore Ts'o 提交于 8月 02, 2010

By using an atomic_t for t_updates and t_outstanding credits, this
should allow us to not need to take transaction t_handle_lock in
jbd2_journal_stop().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a51dca9c

29 4月, 2010 1 次提交

blkdev: generalize flags for blkdev_issue_fn functions · fbd9b09a

由 Dmitry Monakhov 提交于 4月 28, 2010

The patch just convert all blkdev_issue_xxx function to common
set of flags. Wait/allocation semantics preserved.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

fbd9b09a

23 12月, 2009 2 次提交

ext4, jbd2: Add barriers for file systems with exernal journals · cc3e1bea

由 Theodore Ts'o 提交于 12月 23, 2009

This is a bit complicated because we are trying to optimize when we
send barriers to the fs data disk. We could just throw in an extra
barrier to the data disk whenever we send a barrier to the journal
disk, but that's not always strictly necessary.

We only need to send a barrier during a commit when there are data
blocks which are must be written out due to an inode written in
ordered mode, or if fsync() depends on the commit to force data blocks
to disk. Finally, before we drop transactions from the beginning of
the journal during a checkpoint operation, we need to guarantee that
any blocks that were flushed out to the data disk are firmly on the
rust platter before we drop the transaction from the journal.

Thanks to Oleg Drokin for pointing out this flaw in ext3/ext4.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

cc3e1bea

T
ext4: Add new tracepoint for jbd2_cleanup_journal_tail · 71f2be21
由 Theodore Ts'o 提交于 12月 23, 2009
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
71f2be21

30 9月, 2009 1 次提交

jbd2: Use tracepoints for history file · bf699327

由 Theodore Ts'o 提交于 9月 30, 2009

The /proc/fs/jbd2/<dev>/history was maintained manually; by using
tracepoints, we can get all of the existing functionality of the /proc
file plus extra capabilities thanks to the ftrace infrastructure.  We
save memory as a bonus.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bf699327

17 6月, 2009 1 次提交
- T
  jbd2: convert instrumentation from markers to tracepoints · 879c5e6b
  由 Theodore Ts'o 提交于 6月 17, 2009
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
  879c5e6b

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功