- 03 8月, 2016 3 次提交
-
-
由 piaojun 提交于
We found a dlm-blocked situation caused by continuous breakdown of recovery masters described below. To solve this problem, we should purge recovery lock once detecting recovery master goes down. N3 N2 N1(reco master) go down pick up recovery lock and begin recoverying for N2 go down pick up recovery lock failed, then purge it: dlm_purge_lockres ->DROPPING_REF is set send deref to N1 failed, recovery lock is not purged find N1 go down, begin recoverying for N1, but blocked in dlm_do_recovery as DROPPING_REF is set: dlm_do_recovery ->dlm_pick_recovery_master ->dlmlock ->dlm_get_lock_resource ->__dlm_wait_on_lockres_flags(tmpres, DLM_LOCK_RES_DROPPING_REF); Fixes: 8c034396 ("ocfs2/dlm: clear DROPPING_REF flag when the master goes down") Link: http://lkml.kernel.org/r/578453AF.8030404@huawei.comSigned-off-by: NJun Piao <piaojun@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NJiufei Xue <xuejiufei@huawei.com> Reviewed-by: NMark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 piaojun 提交于
We found a BUG situation that lockres is migrated during deref described below. To solve the BUG, we could purge lockres directly when other node says I did not have a ref. Additionally, we'd better purge lockres if master goes down, as no one will response deref done. Node 1 Node 2(old master) Node3(new master) dlm_purge_lockres send deref to N2 leave domain migrate lockres to N3 finish migration send do assert master to N1 receive do assert msg form N3, but can not find lockres because DROPPING_REF is set, so the owner is still N2. receive deref from N1 and response -EINVAL because lockres is migrated BUG when receive -EINVAL in dlm_drop_lockres_ref Fixes: 842b90b6 ("ocfs2/dlm: return in progress if master can not clear the refmap bit right now") Link: http://lkml.kernel.org/r/57845103.3070406@huawei.comSigned-off-by: NJun Piao <piaojun@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NJiufei Xue <xuejiufei@huawei.com> Reviewed-by: NMark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 piaojun 提交于
ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF is cleared before dlm_deref_lockres_done_handler We found a BUG situation in which DLM_LOCK_RES_DROPPING_REF is cleared unexpected that described below. To solve the bug, we disable the BUG_ON and purge lockres in dlm_do_local_recovery_cleanup. Node 1 Node 2(master) dlm_purge_lockres dlm_deref_lockres_handler DLM_LOCK_RES_SETREF_INPROG is set response DLM_DEREF_RESPONSE_INPROG receive DLM_DEREF_RESPONSE_INPROG stop puring in dlm_purge_lockres and wait for DLM_DEREF_RESPONSE_DONE dispatch dlm_deref_lockres_worker response DLM_DEREF_RESPONSE_DONE receive DLM_DEREF_RESPONSE_DONE and prepare to purge lockres Node 2 goes down find Node2 down and do local clean up for Node2: dlm_do_local_recovery_cleanup -> clear DLM_LOCK_RES_DROPPING_REF when purging lockres, BUG_ON happens because DLM_LOCK_RES_DROPPING_REF is clear: dlm_deref_lockres_done_handler ->BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF)); [akpm@linux-foundation.org: fix duplicated write to `ret'] Fixes: 60d663cb ("ocfs2/dlm: add DEREF_DONE message") Link: http://lkml.kernel.org/r/57845055.9080702@huawei.comSigned-off-by: NJun Piao <piaojun@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NJiufei Xue <xuejiufei@huawei.com> Reviewed-by: NMark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 4月, 2016 1 次提交
-
-
由 xuejiufei 提交于
dlm_deref_lockres_done_handler() should return zero if the message is successfully handled. Fixes: 60d663cb ("ocfs2/dlm: add DEREF_DONE message"). Signed-off-by: Nxuejiufei <xuejiufei@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 3月, 2016 3 次提交
-
-
由 Jiufei Xue 提交于
when o2hb detect a node down, it first set the dead node to recovery map and create ocfs2rec which will replay journal for dead node. o2hb thread then call dlm_do_local_recovery_cleanup() to delete the lock for dead node. After the lock of dead node is gone, locks for other nodes can be granted and may modify the meta data without replaying journal of the dead node. The detail is described as follows. N1 N2 N3(master) modify the extent tree of inode, and commit dirty metadata to journal, then goes down. o2hb thread detects N1 goes down, set recovery map and delete the lock of N1. dlm_thread flush ast for the lock of N2. do not detect the death of N1, so recovery map is empty. read inode from disk without replaying the journal of N1 and modify the extent tree of the inode that N1 had modified. ocfs2rec recover the journal of N1. The modification of N2 is lost. The modification of N1 and N2 are not serial, and it will lead to read-only file system. We can set recovery_waiting flag to the lock resource after delete the lock for dead node to prevent other node from getting the lock before dlm recovery. After dlm recovery, the recovery map on N2 is not empty, ocfs2_inode_lock_full_nested() will wait for ocfs2 recovery. Signed-off-by: NJiufei Xue <xuejiufei@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 xuejiufei 提交于
Master returns in-progress to non-master node when it can not clear the refmap bit right now. And non-master node will not purge the lock resource until receiving deref done message. Signed-off-by: Nxuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 xuejiufei 提交于
This series of patches is to fix the dis-order issue of setting/clearing refmap bit described below. Node 1 Node 2(master) dlmlock dlm_do_master_request dlm_master_request_handler -> dlm_lockres_set_refmap_bit dlmlock succeed dlmunlock succeed dlm_purge_lockres dlm_deref_handler -> find lock resource is in DLM_LOCK_RES_SETREF_INPROG state, so dispatch a deref work dlm_purge_lockres succeed. call dlmlock again dlm_do_master_request dlm_master_request_handler -> dlm_lockres_set_refmap_bit deref work trigger, call dlm_lockres_clear_refmap_bit to clear Node 1 from refmap dlm_purge_lockres succeed dlm_send_remote_lock_request return DLM_IVLOCKID because the lockres is not exist BUG if the lockres is $RECOVERY This series of patches add a new message to keep the order of set and clear. Other nodes can purge the lock resource only after the refmap bit on master is cleared. This patch is to add DEREF_DONE message and corresponding handler. Node can purge the lock resource after receiving this message. As a new message is added, so increase the minor number of dlm protocol version. Signed-off-by: Nxuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 1月, 2016 4 次提交
-
-
由 xuejiufei 提交于
When two processes are migrating the same lockres, dlm_add_migration_mle() return -EEXIST, but insert a new mle in hash list. dlm_migrate_lockres() will detach the old mle and free the new one which is already in hash list, that will destroy the list. Signed-off-by: NJiufei Xue <xuejiufei@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 xuejiufei 提交于
We have found that migration source will trigger a BUG that the refcount of mle is already zero before put when the target is down during migration. The situation is as follows: dlm_migrate_lockres dlm_add_migration_mle dlm_mark_lockres_migrating dlm_get_mle_inuse <<<<<< Now the refcount of the mle is 2. dlm_send_one_lockres and wait for the target to become the new master. <<<<<< o2hb detect the target down and clean the migration mle. Now the refcount is 1. dlm_migrate_lockres woken, and put the mle twice when found the target goes down which trigger the BUG with the following message: "ERROR: bad mle: ". Signed-off-by: NJiufei Xue <xuejiufei@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Xue jiufei 提交于
dlm_grab() may return NULL when the node is doing unmount. When doing code review, we found that some dlm handlers may return error to caller when dlm_grab() returns NULL and make caller BUG or other problems. Here is an example: Node 1 Node 2 receives migration message from node 3, and send migrate request to others start unmounting receives migrate request from node 1 and call dlm_migrate_request_handler() unmount thread unregisters domain handlers and removes dlm_context from dlm_domains dlm_migrate_request_handlers() returns -EINVAL to node 1 Exit migration neither clearing the migration state nor sending assert master message to node 3 which cause node 3 hung. Signed-off-by: NJiufei Xue <xuejiufei@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NYiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 jiangyiwen 提交于
Commit f3f85464 ("ocfs2_dlm: Ensure correct ordering of set/clear refmap bit on lockres") still exists a race which can't ensure the ordering is exactly correct. Node1 Node2 Node3 umount, migrate lockres to Node2 migrate finished, send migrate request to Node3 received migrate request, create a migration_mle, respond to Node2. set DLM_LOCK_RES_SETREF_INPROG and send assert master to Node3 delete migration_mle in assert_master_handler, Node3 umount without response dlm_thread purge this lockres, send drop deref message to Node2 found the flag of DLM_LOCK_RES_SETREF_INPROG is set, dispatch dlm_deref_lockres_worker to clear refmap, but in function of dlm_deref_lockres_worker, only if node in refmap it wait DLM_LOCK_RES_SETREF_INPROG to be cleared. So worker is done successfully purge lockres, send assert master response to Node1, and finish umount set Node3 in refmap, and it won't be cleared forever, thus lead to umount hung so wait until DLM_LOCK_RES_SETREF_INPROG is cleared in dlm_deref_lockres_worker. Signed-off-by: NYiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 12月, 2015 1 次提交
-
-
由 xuejiufei 提交于
We have found a BUG on res->migration_pending when migrating lock resources. The situation is as follows. dlm_mark_lockres_migration res->migration_pending = 1; __dlm_lockres_reserve_ast dlm_lockres_release_ast returns with res->migration_pending remains because other threads reserve asts wait dlm_migration_can_proceed returns 1 >>>>>>> o2hb found that target goes down and remove target from domain_map dlm_migration_can_proceed returns 1 dlm_mark_lockres_migrating returns -ESHOTDOWN with res->migration_pending still remains. When reentering dlm_mark_lockres_migrating(), it will trigger the BUG_ON with res->migration_pending. So clear migration_pending when target is down. Signed-off-by: NJiufei Xue <xuejiufei@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 10月, 2015 1 次提交
-
-
由 Joseph Qi 提交于
dlm_lockres_put will call dlm_lockres_release if it is the last reference, and then it may call dlm_print_one_lock_resource and take lockres spinlock. So unlock lockres spinlock before dlm_lockres_put to avoid deadlock. Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 9月, 2015 1 次提交
-
-
由 Joseph Qi 提交于
The order of the following three spinlocks should be: dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock But dlm_dispatch_assert_master() is called while holding dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls dlm_grab() which will take dlm_domain_lock. Once another thread (for example, dlm_query_join_handler) has already taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock happens. Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: "Junxiao Bi" <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 9月, 2015 1 次提交
-
-
由 Yiwen Jiang 提交于
The following case will lead to a lockres is freed but is still in use. cat /sys/kernel/debug/o2dlm/locking_state dlm_thread lockres_seq_start -> lock dlm->track_lock -> get resA resA->refs decrease to 0, call dlm_lockres_release, and wait for "cat" unlock. Although resA->refs is already set to 0, increase resA->refs, and then unlock lock dlm->track_lock -> list_del_init() -> unlock -> free resA In such a race case, invalid address access may occurs. So we should delete list res->tracking before resA->refs decrease to 0. Signed-off-by: NYiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NMark Fasheh <mfasheh@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 5月, 2015 1 次提交
-
-
由 Junxiao Bi 提交于
There is a race window in dlm_get_lock_resource(), which may return a lock resource which has been purged. This will cause the process to hang forever in dlmlock() as the ast msg can't be handled due to its lock resource not existing. dlm_get_lock_resource { ... spin_lock(&dlm->spinlock); tmpres = __dlm_lookup_lockres_full(dlm, lockid, namelen, hash); if (tmpres) { spin_unlock(&dlm->spinlock); >>>>>>>> race window, dlm_run_purge_list() may run and purge the lock resource spin_lock(&tmpres->spinlock); ... spin_unlock(&tmpres->spinlock); } } Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 12月, 2014 1 次提交
-
-
由 Joseph Qi 提交于
Commit ac4fef4d ("ocfs2/dlm: do not purge lockres that is queued for assert master") may have the following possible race case: dlm_dispatch_assert_master dlm_wq ======================================================================== queue_work(dlm->quedlm_worker, &dlm->dispatched_work); dispatch work, dlm_lockres_drop_inflight_worker *BUG_ON(res->inflight_assert_workers == 0)* dlm_lockres_grab_inflight_worker inflight_assert_workers++ So ensure inflight_assert_workers to be increased first. Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Signed-off-by: NXue jiufei <xuejiufei@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: NMark Fasheh <mfasheh@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 12月, 2014 1 次提交
-
-
由 Srinivas Eeda 提交于
Node A sends master query request to node B which is the master. At this time lockres happens to be on purgelist. dlm_master_request_handler gets the dlm spinlock, finds the resource and releases the dlm spin lock. Right at this dlm_thread on this node could purge the lockres. dlm_master_request_handler can then acquire lockres spinlock and reply to Node A that node B is the master even though lockres on node B is purged. The above scenario will now make node A falsely think node B is the master which is inconsistent. Further if another node C tries to master the same resource, every node will respond they are not the master. Node C then masters the resource and sends assert master to all nodes. This will now make node A crash with the following message. dlm_assert_master_handler:1831 ERROR: DIE! Mastery assert from 9, but current owner is 10! Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: NWengang Wang <wen.gang.wang@oracle.com> Tested-by: NJoseph Qi <joseph.qi@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 10月, 2014 1 次提交
-
-
由 Xue jiufei 提交于
Remove the branch that free res->lockname.name because the condition is never satisfied when jump to label error. Signed-off-by: Njoyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 10月, 2014 1 次提交
-
-
由 alex chen 提交于
In dlm_assert_master_handler, the mle is get in dlm_find_mle, should be put when goto kill, otherwise, this mle will never be released. Signed-off-by: NAlex Chen <alex.chen@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: Njoyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 9月, 2014 1 次提交
-
-
由 Joseph Qi 提交于
There is a deadlock case which reported by Guozhonghua: https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html This case is caused by &res->spinlock and &dlm->master_lock misordering in different threads. It was introduced by commit 8d400b81 ("ocfs2/dlm: Clean up refmap helpers"). Since lockres is new, it doesn't not require the &res->spinlock. So remove it. Fixes: 8d400b81 ("ocfs2/dlm: Clean up refmap helpers") Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: Njoyce.xue <xuejiufei@huawei.com> Reported-by: NGuozhonghua <guozhonghua@h3c.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 8月, 2014 1 次提交
-
-
由 Tariq Saeed 提交于
Orabug: 19074140 When umount is issued during recovery on the new master that has not finished remastering locks, it triggers BUG() in dlm_send_mig_lockres_msg(). Here is the situation: 1) node A has a lock on resource X mastered by node B. 2) node B dies -> node A sets recovering flag for res X 3) Node C becomes the new master for resources owned by the dead node and is remastering locks of the dead node but has not finished the remastering process yet. 4) umount is issued on node C. 5) During processing of umount, ignoring unfished recovery, node C attempts to migrate resource X to node A. 6) node A finds res X in DLM_LOCK_RES_RECOVERING state, considers it a logic error and sends back -EFAULT. 7) node C asserts BUG() upon seeing EFAULT resp from node B. Fix is to delay migrating res X till remastering is finished at which point recovering flag will be cleared on both A and C. Signed-off-by: NTariq Saeed <tariq.x.saeed@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 6月, 2014 2 次提交
-
-
由 Xue jiufei 提交于
When workqueue is delayed, it may occur that a lockres is purged while it is still queued for master assert. it may trigger BUG() as follows. N1 N2 dlm_get_lockres() ->dlm_do_master_requery is the master of lockres, so queue assert_master work dlm_thread() start running and purge the lockres dlm_assert_master_worker() send assert master message to other nodes receiving the assert_master message, set master to N2 dlmlock_remote() send create_lock message to N2, but receive DLM_IVLOCKID, if it is RECOVERY lockres, it triggers the BUG(). Another BUG() is triggered when N3 become the new master and send assert_master to N1, N1 will trigger the BUG() because owner doesn't match. So we should not purge lockres when it is queued for assert master. Signed-off-by: Njoyce.xue <xuejiufei@huawei.com> Reviewed-by: NMark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 jiangyiwen 提交于
The following case may lead to endless loop during umount. node A node B node C node D umount volume, migrate lockres1 to B want to lock lockres1, send MASTER_REQUEST_MSG to C init block mle send MIGRATE_REQUEST_MSG to C find a block mle, and then return DLM_MIGRATE_RESPONSE_MASTERY_REF to B set C in refmap umount successfully try to umount, endless loop occurs when migrate lockres1 since C is in refmap So we can fix this endless loop case by only returning DLM_MIGRATE_RESPONSE_MASTERY_REF if it has a mastery mle when receiving MIGRATE_REQUEST_MSG. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Njiangyiwen <jiangyiwen@huawei.com> Reviewed-by: NMark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Xue jiufei <xuejiufei@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 6月, 2014 1 次提交
-
-
由 Fabian Frederick 提交于
Static values are automatically initialized to NULL. Signed-off-by: NFabian Frederick <fabf@skynet.be> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 5月, 2014 1 次提交
-
-
由 Joseph Qi 提交于
In dlm_init, if create dlm_lockname_cache failed in dlm_init_master_caches, it will destroy dlm_lockres_cache which created before twice. And this will cause system die when loading modules. Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 11月, 2013 2 次提交
-
-
由 Junxiao Bi 提交于
Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Xue jiufei 提交于
We trigger a bug in __dlm_lockres_reserve_ast() when we parallel umount 4 nodes. The situation is as follows: 1) Node A migrate all lockres it owned(eg. lockres A) to other nodes say node B when it umounts. 2) Receiving MIG_LOCKRES message from A, Node B masters the lockres A with DLM_LOCK_RES_MIGRATING state set. 3) Then we umount ocfs2 on node B. It also should migrate lockres A to another node, say node C. But now, DLM_LOCK_RES_MIGRATING state of lockers A is not cleared. Node B triggered the BUG on lockres with state DLM_LOCK_RES_MIGRATING. Signed-off-by: NXuejiufei <xuejiufei@huawei.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Tariq Saeed <tariq.x.saeed@oracle.com> Cc: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 9月, 2013 1 次提交
-
-
由 Dong Fang 提交于
[dan.carpenter@oracle.com: fix up some NULL dereference bugs] Signed-off-by: NDong Fang <yp.fangdong@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 2月, 2013 1 次提交
-
-
由 Dan Carpenter 提交于
My static checker complains that this is called with a spin_lock held in dlm_master_requery_handler() from dlmrecovery.c. Probably the reason we have not received any bug reports about this is that recovery is not a common operation. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 25 7月, 2011 3 次提交
-
-
由 Sunil Mushran 提交于
The inflight reference count, in the lock resource, is taken to pin the resource in memory. We take it when a new resource is created and release it after a lock is attached to it. We do this to prevent the resource from getting purged prematurely. Earlier this reference count was being taken for locally mastered resources only. This patch extends the same functionality for remotely mastered ones. We are doing this because the same premature purging could occur for remotely mastered resources if the remote node were to die before completion of the create lock. Fix for Oracle bug#12405575. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
-
由 Sunil Mushran 提交于
Patch cleans up helpers that set/clear refmap bits and grab/drop inflight lock ref counts. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
-
由 Sunil Mushran 提交于
o2dlm messages needed a facelift. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
-
- 26 5月, 2011 1 次提交
-
-
由 Sunil Mushran 提交于
During dlm domain shutdown, o2dlm has to free all the lock resources. Ones that have no locks and references are freed. Ones that have locks and/or references are migrated to another node. The first task in migration is finding a target. Currently we scan the lock resource and find one node that either has a lock or a reference. This is not very efficient in a parallel umount case as we might end up migrating the lock resource to a node which itself may have to migrate it to a third node. The patch scans the dlm->exit_domain_map to ensure the target node is not leaving the domain. If no valid target node is found, o2dlm does not migrate the resource but instead waits for the unlock and deref messages that will allow it to free the resource. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
- 24 5月, 2011 1 次提交
-
-
由 Sunil Mushran 提交于
Patch cleans up the gunk added by commit 388c4bcb. dlm_is_lockres_migrateable() now returns 1 if lockresource is deemed migrateable and 0 if not. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
- 14 5月, 2011 1 次提交
-
-
由 Sunil Mushran 提交于
During resource migration, if the target node were to die, the thread doing the migration spins until the target node is not removed from the domain map. This patch slows the spin by making the thread wait for the recovery to kick in. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
- 31 3月, 2011 1 次提交
-
-
由 Lucas De Marchi 提交于
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
-
- 21 2月, 2011 1 次提交
-
-
由 Tao Ma 提交于
ENTRY is used to record the entry of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. So for mlog_entry_void, we just remove it. for mlog_entry(...), we replace it with mlog(0,...), and they will be replace by trace event later. Signed-off-by: NTao Ma <boyu.mt@taobao.com>
-
- 10 12月, 2010 1 次提交
-
-
由 Sunil Mushran 提交于
o2dlm was not migrating resources with zero locks because it assumed that that resource would get purged by dlm_thread. However, some usage patterns involve creating and dropping locks at a high rate leading to the migrate thread seeing zero locks but the purge thread seeing an active reference. When this happens, the dlm_thread cannot purge the resource and the migrate thread sees no reason to migrate that resource. The spell is broken when the migrate thread catches the resource with a lock. The fix is to make the migrate thread also consider the reference map. This usage pattern can be triggered by userspace on userdlm locks and flocks. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 24 9月, 2010 1 次提交
-
-
由 Srinivas Eeda 提交于
While umounting, a block mle doesn't get freed if dlm is shutdown after master request is received but before assert master. This results in unclean shutdown of dlm domain. This patch frees all mles that lie around after other nodes were notified about exiting the dlm and marking dlm state as leaving. Only block mles are expected to be around, so we log ERROR for other mles but still free them. Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-