提交 52132ff5 编写于 作者: C Changwei Ge 提交者: Greg Kroah-Hartman

ocfs2: wait for recovering done after direct unlock request

[ Upstream commit 0a3775e4f883912944481cf2ef36eb6383a9cc74 ]

There is a scenario causing ocfs2 umount hang when multiple hosts are
rebooting at the same time.

NODE1                           NODE2               NODE3
send unlock requset to NODE2
                                dies
                                                    become recovery master
                                                    recover NODE2
find NODE2 dead
mark resource RECOVERING
directly remove lock from grant list
calculate usage but RECOVERING marked
**miss the window of purging
clear RECOVERING

To reproduce this issue, crash a host and then umount ocfs2
from another node.

To solve this, just let unlock progress wait for recovery done.

Link: http://lkml.kernel.org/r/1550124866-20367-1-git-send-email-gechangwei@live.cnSigned-off-by: NChangwei Ge <gechangwei@live.cn>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
上级 d4a54645
...@@ -106,6 +106,7 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm, ...@@ -106,6 +106,7 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm,
int actions = 0; int actions = 0;
int in_use; int in_use;
u8 owner; u8 owner;
int recovery_wait = 0;
mlog(0, "master_node = %d, valblk = %d\n", master_node, mlog(0, "master_node = %d, valblk = %d\n", master_node,
flags & LKM_VALBLK); flags & LKM_VALBLK);
...@@ -208,9 +209,12 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm, ...@@ -208,9 +209,12 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm,
} }
if (flags & LKM_CANCEL) if (flags & LKM_CANCEL)
lock->cancel_pending = 0; lock->cancel_pending = 0;
else {
if (!lock->unlock_pending)
recovery_wait = 1;
else else
lock->unlock_pending = 0; lock->unlock_pending = 0;
}
} }
/* get an extra ref on lock. if we are just switching /* get an extra ref on lock. if we are just switching
...@@ -244,6 +248,17 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm, ...@@ -244,6 +248,17 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm,
spin_unlock(&res->spinlock); spin_unlock(&res->spinlock);
wake_up(&res->wq); wake_up(&res->wq);
if (recovery_wait) {
spin_lock(&res->spinlock);
/* Unlock request will directly succeed after owner dies,
* and the lock is already removed from grant list. We have to
* wait for RECOVERING done or we miss the chance to purge it
* since the removement is much faster than RECOVERING proc.
*/
__dlm_wait_on_lockres_flags(res, DLM_LOCK_RES_RECOVERING);
spin_unlock(&res->spinlock);
}
/* let the caller's final dlm_lock_put handle the actual kfree */ /* let the caller's final dlm_lock_put handle the actual kfree */
if (actions & DLM_UNLOCK_FREE_LOCK) { if (actions & DLM_UNLOCK_FREE_LOCK) {
/* this should always be coupled with list removal */ /* this should always be coupled with list removal */
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册