• J
    Revert "ocfs2: mount shared volume without ha stack" · 332e2e3c
    Junxiao Bi 提交于
    stable inclusion
    from stable-v5.10.135
    commit 5528990512a22c66a62affaa1a81e5a496e88053
    category: bugfix
    bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZWFM
    
    Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5528990512a22c66a62affaa1a81e5a496e88053
    
    --------------------------------
    
    commit c80af0c2 upstream.
    
    This reverts commit 912f655d.
    
    This commit introduced a regression that can cause mount hung.  The
    changes in __ocfs2_find_empty_slot causes that any node with none-zero
    node number can grab the slot that was already taken by node 0, so node 1
    will access the same journal with node 0, when it try to grab journal
    cluster lock, it will hung because it was already acquired by node 0.
    It's very easy to reproduce this, in one cluster, mount node 0 first, then
    node 1, you will see the following call trace from node 1.
    
    [13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122 seconds.
    [13148.739691]       Not tainted 5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
    [13148.742560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [13148.745846] task:mount.ocfs2     state:D stack:    0 pid:53045 ppid: 53044 flags:0x00004000
    [13148.749354] Call Trace:
    [13148.750718]  <TASK>
    [13148.752019]  ? usleep_range+0x90/0x89
    [13148.753882]  __schedule+0x210/0x567
    [13148.755684]  schedule+0x44/0xa8
    [13148.757270]  schedule_timeout+0x106/0x13c
    [13148.759273]  ? __prepare_to_swait+0x53/0x78
    [13148.761218]  __wait_for_common+0xae/0x163
    [13148.763144]  __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2]
    [13148.765780]  ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
    [13148.768312]  ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
    [13148.770968]  ocfs2_journal_init+0x91/0x340 [ocfs2]
    [13148.773202]  ocfs2_check_volume+0x39/0x461 [ocfs2]
    [13148.775401]  ? iput+0x69/0xba
    [13148.777047]  ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2]
    [13148.779646]  ocfs2_fill_super+0x54b/0x853 [ocfs2]
    [13148.781756]  mount_bdev+0x190/0x1b7
    [13148.783443]  ? ocfs2_remount+0x440/0x440 [ocfs2]
    [13148.785634]  legacy_get_tree+0x27/0x48
    [13148.787466]  vfs_get_tree+0x25/0xd0
    [13148.789270]  do_new_mount+0x18c/0x2d9
    [13148.791046]  __x64_sys_mount+0x10e/0x142
    [13148.792911]  do_syscall_64+0x3b/0x89
    [13148.794667]  entry_SYSCALL_64_after_hwframe+0x170/0x0
    [13148.797051] RIP: 0033:0x7f2309f6e26e
    [13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
    [13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX: 00007f2309f6e26e
    [13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI: 0000559aa93a22b0
    [13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09: 00007f230a0b4820
    [13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcee7d420
    [13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15: 0000000000000000
    [13148.816564]  </TASK>
    
    To fix it, we can just fix __ocfs2_find_empty_slot.  But original commit
    introduced the feature to mount ocfs2 locally even it is cluster based,
    that is a very dangerous, it can easily cause serious data corruption,
    there is no way to stop other nodes mounting the fs and corrupting it.
    Setup ha or other cluster-aware stack is just the cost that we have to
    take for avoiding corruption, otherwise we have to do it in kernel.
    
    Link: https://lkml.kernel.org/r/20220603222801.42488-1-junxiao.bi@oracle.com
    Fixes: 912f655d("ocfs2: mount shared volume without ha stack")
    Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
    Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
    Cc: Mark Fasheh <mark@fasheh.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Changwei Ge <gechangwei@live.cn>
    Cc: Gang He <ghe@suse.com>
    Cc: Jun Piao <piaojun@huawei.com>
    Cc: <heming.zhao@suse.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
    Reviewed-by: NWei Li <liwei391@huawei.com>
    332e2e3c
ocfs2.h 24.5 KB