• Y
    ext4: flush s_error_work before journal destroy in ext4_fill_super · bb39867f
    yangerkun 提交于
    mainline inclusion
    from mainline-v5.15-rc4
    commit bb9464e0
    category: bugfix
    bugzilla: 176737 https://gitee.com/openeuler/kernel/issues/I4DDEL
    Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bb9464e08309f6befe80866f5be51778ca355ee9
    
    ---------------------------
    
    The error path in ext4_fill_super forget to flush s_error_work before
    journal destroy, and it may trigger the follow bug since
    flush_stashed_error_work can run concurrently with journal destroy
    without any protection for sbi->s_journal.
    
    [32031.740193] EXT4-fs (loop66): get root inode failed
    [32031.740484] EXT4-fs (loop66): mount failed
    [32031.759805] ------------[ cut here ]------------
    [32031.759807] kernel BUG at fs/jbd2/transaction.c:373!
    [32031.760075] invalid opcode: 0000 [#1] SMP PTI
    [32031.760336] CPU: 5 PID: 1029268 Comm: kworker/5:1 Kdump: loaded
    4.18.0
    [32031.765112] Call Trace:
    [32031.765375]  ? __switch_to_asm+0x35/0x70
    [32031.765635]  ? __switch_to_asm+0x41/0x70
    [32031.765893]  ? __switch_to_asm+0x35/0x70
    [32031.766148]  ? __switch_to_asm+0x41/0x70
    [32031.766405]  ? _cond_resched+0x15/0x40
    [32031.766665]  jbd2__journal_start+0xf1/0x1f0 [jbd2]
    [32031.766934]  jbd2_journal_start+0x19/0x20 [jbd2]
    [32031.767218]  flush_stashed_error_work+0x30/0x90 [ext4]
    [32031.767487]  process_one_work+0x195/0x390
    [32031.767747]  worker_thread+0x30/0x390
    [32031.768007]  ? process_one_work+0x390/0x390
    [32031.768265]  kthread+0x10d/0x130
    [32031.768521]  ? kthread_flush_work_fn+0x10/0x10
    [32031.768778]  ret_from_fork+0x35/0x40
    
    static int start_this_handle(...)
        BUG_ON(journal->j_flags & JBD2_UNMOUNT); <---- Trigger this
    
    Besides, after we enable fast commit, ext4_fc_replay can add work to
    s_error_work but return success, so the latter journal destroy in
    ext4_load_journal can trigger this problem too.
    
    Fix this problem with two steps:
    1. Call ext4_commit_super directly in ext4_handle_error for the case
       that called from ext4_fc_replay
    2. Since it's hard to pair the init and flush for s_error_work, we'd
       better add a extras flush_work before journal destroy in
       ext4_fill_super
    
    Besides, this patch will call ext4_commit_super in ext4_handle_error for
    any nojournal case too. But it seems safe since the reason we call
    schedule_work was that we should save error info to sb through journal
    if available. Conversely, for the nojournal case, it seems useless delay
    commit superblock to s_error_work.
    
    Fixes: c92dc856 ("ext4: defer saving error info from atomic context")
    Fixes: 2d01ddc8 ("ext4: save error info to sb through journal if available")
    Cc: stable@kernel.org
    Signed-off-by: Nyangerkun <yangerkun@huawei.com>
    Reviewed-by: NJan Kara <jack@suse.cz>
    Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
    Link: https://lore.kernel.org/r/20210924093917.1953239-1-yangerkun@huawei.comReviewed-by: NZhang Yi <yi.zhang@huawei.com>
    Signed-off-by: NChen Jun <chenjun102@huawei.com>
    Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
    bb39867f
super.c 191.3 KB