1. 15 1月, 2016 3 次提交
    • X
      ocfs2/dlm: ignore cleaning the migration mle that is inuse · bef5502d
      xuejiufei 提交于
      We have found that migration source will trigger a BUG that the refcount
      of mle is already zero before put when the target is down during
      migration.  The situation is as follows:
      
      dlm_migrate_lockres
        dlm_add_migration_mle
        dlm_mark_lockres_migrating
        dlm_get_mle_inuse
        <<<<<< Now the refcount of the mle is 2.
        dlm_send_one_lockres and wait for the target to become the
        new master.
        <<<<<< o2hb detect the target down and clean the migration
        mle. Now the refcount is 1.
      
      dlm_migrate_lockres woken, and put the mle twice when found the target
      goes down which trigger the BUG with the following message:
      
        "ERROR: bad mle: ".
      Signed-off-by: NJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bef5502d
    • X
      ocfs2/dlm: return appropriate value when dlm_grab() returns NULL · c372f219
      Xue jiufei 提交于
      dlm_grab() may return NULL when the node is doing unmount.  When doing
      code review, we found that some dlm handlers may return error to caller
      when dlm_grab() returns NULL and make caller BUG or other problems.
      Here is an example:
      
      Node 1                                 Node 2
      receives migration message
      from node 3, and send
      migrate request to others
                                           start unmounting
      
                                           receives migrate request
                                           from node 1 and call
                                           dlm_migrate_request_handler()
      
                                           unmount thread unregisters
                                           domain handlers and removes
                                           dlm_context from dlm_domains
      
                                           dlm_migrate_request_handlers()
                                           returns -EINVAL to node 1
      Exit migration neither clearing the
      migration state nor sending
      assert master message to node 3 which
      cause node 3 hung.
      Signed-off-by: NJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
      Reviewed-by: NYiwen Jiang <jiangyiwen@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c372f219
    • J
      ocfs2/dlm: wait until DLM_LOCK_RES_SETREF_INPROG is cleared in dlm_deref_lockres_worker · b5560143
      jiangyiwen 提交于
      Commit f3f85464 ("ocfs2_dlm: Ensure correct ordering of set/clear
      refmap bit on lockres") still exists a race which can't ensure the
      ordering is exactly correct.
      
      Node1               Node2                    Node3
      umount, migrate
      lockres to Node2
                          migrate finished,
                          send migrate request
                          to Node3
                                                    received migrate request,
                                                    create a migration_mle,
                                                    respond to Node2.
                          set DLM_LOCK_RES_SETREF_INPROG
                          and send assert master to
                          Node3
                                                    delete migration_mle in
                                                    assert_master_handler,
                                                    Node3 umount without response
                                                    dlm_thread purge
                                                    this lockres, send drop
                                                    deref message to Node2
                          found the flag of
                          DLM_LOCK_RES_SETREF_INPROG
                          is set, dispatch
                          dlm_deref_lockres_worker to
                          clear refmap, but in function of
                          dlm_deref_lockres_worker,
                          only if node in refmap it wait
                          DLM_LOCK_RES_SETREF_INPROG
                          to be cleared. So worker is
                          done successfully
      
                                                    purge lockres, send
                                                    assert master response
                                                    to Node1, and finish umount
                          set Node3 in refmap, and it
                          won't be cleared forever, thus
                          lead to umount hung
      
      so wait until DLM_LOCK_RES_SETREF_INPROG is cleared in
      dlm_deref_lockres_worker.
      Signed-off-by: NYiwen Jiang <jiangyiwen@huawei.com>
      Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5560143
  2. 30 12月, 2015 1 次提交
    • X
      ocfs2/dlm: clear migration_pending when migration target goes down · cc28d6d8
      xuejiufei 提交于
      We have found a BUG on res->migration_pending when migrating lock
      resources.  The situation is as follows.
      
      dlm_mark_lockres_migration
        res->migration_pending = 1;
        __dlm_lockres_reserve_ast
        dlm_lockres_release_ast returns with res->migration_pending remains
            because other threads reserve asts
        wait dlm_migration_can_proceed returns 1
        >>>>>>> o2hb found that target goes down and remove target
                from domain_map
        dlm_migration_can_proceed returns 1
        dlm_mark_lockres_migrating returns -ESHOTDOWN with
            res->migration_pending still remains.
      
      When reentering dlm_mark_lockres_migrating(), it will trigger the BUG_ON
      with res->migration_pending.  So clear migration_pending when target is
      down.
      Signed-off-by: NJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc28d6d8
  3. 23 10月, 2015 1 次提交
  4. 23 9月, 2015 1 次提交
  5. 05 9月, 2015 1 次提交
  6. 06 5月, 2015 1 次提交
    • J
      ocfs2: dlm: fix race between purge and get lock resource · b1432a2a
      Junxiao Bi 提交于
      There is a race window in dlm_get_lock_resource(), which may return a
      lock resource which has been purged.  This will cause the process to
      hang forever in dlmlock() as the ast msg can't be handled due to its
      lock resource not existing.
      
          dlm_get_lock_resource {
              ...
              spin_lock(&dlm->spinlock);
              tmpres = __dlm_lookup_lockres_full(dlm, lockid, namelen, hash);
              if (tmpres) {
                   spin_unlock(&dlm->spinlock);
                   >>>>>>>> race window, dlm_run_purge_list() may run and purge
                                    the lock resource
                   spin_lock(&tmpres->spinlock);
                   ...
                   spin_unlock(&tmpres->spinlock);
              }
          }
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1432a2a
  7. 19 12月, 2014 1 次提交
  8. 11 12月, 2014 1 次提交
    • S
      ocfs2: o2dlm: fix a race between purge and master query · cb79662b
      Srinivas Eeda 提交于
      Node A sends master query request to node B which is the master.  At this
      time lockres happens to be on purgelist.  dlm_master_request_handler gets
      the dlm spinlock, finds the resource and releases the dlm spin lock.
      Right at this dlm_thread on this node could purge the lockres.
      dlm_master_request_handler can then acquire lockres spinlock and reply to
      Node A that node B is the master even though lockres on node B is purged.
      
      The above scenario will now make node A falsely think node B is the master
      which is inconsistent.  Further if another node C tries to master the same
      resource, every node will respond they are not the master.  Node C then
      masters the resource and sends assert master to all nodes.  This will now
      make node A crash with the following message.
      
      dlm_assert_master_handler:1831 ERROR: DIE! Mastery assert from 9, but current
      owner is 10!
      Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Reviewed-by: NWengang Wang <wen.gang.wang@oracle.com>
      Tested-by: NJoseph Qi <joseph.qi@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb79662b
  9. 10 10月, 2014 1 次提交
  10. 03 10月, 2014 1 次提交
  11. 26 9月, 2014 1 次提交
  12. 07 8月, 2014 1 次提交
    • T
      ocfs2: race between umount and unfinished remastering during recovery · bba1cb17
      Tariq Saeed 提交于
      Orabug: 19074140
      
      When umount is issued during recovery on the new master that has not
      finished remastering locks, it triggers BUG() in
      dlm_send_mig_lockres_msg().  Here is the situation:
      
       1) node A has a lock on resource X mastered by node B.
      
       2) node B dies ->  node A sets recovering flag for res X
      
       3) Node C becomes the new master for resources owned by the
          dead node and is remastering locks of the dead node but
          has not finished the remastering process yet.
      
       4) umount is issued on node C.
      
       5) During processing of umount, ignoring unfished recovery,
          node C attempts to migrate resource X to node A.
      
       6) node A finds res X in DLM_LOCK_RES_RECOVERING state, considers
          it a logic error and sends back -EFAULT.
      
       7) node C asserts BUG() upon seeing EFAULT resp from node B.
      
      Fix is to delay migrating res X till remastering is finished at which
      point recovering flag will be cleared on both A and C.
      Signed-off-by: NTariq Saeed <tariq.x.saeed@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bba1cb17
  13. 24 6月, 2014 2 次提交
  14. 05 6月, 2014 1 次提交
  15. 24 5月, 2014 1 次提交
  16. 13 11月, 2013 2 次提交
  17. 12 9月, 2013 1 次提交
  18. 26 2月, 2013 1 次提交
  19. 25 7月, 2011 3 次提交
  20. 26 5月, 2011 1 次提交
    • S
      ocfs2/dlm: Do not migrate resource to a node that is leaving the domain · 66effd3c
      Sunil Mushran 提交于
      During dlm domain shutdown, o2dlm has to free all the lock resources. Ones that
      have no locks and references are freed. Ones that have locks and/or references
      are migrated to another node.
      
      The first task in migration is finding a target. Currently we scan the lock
      resource and find one node that either has a lock or a reference. This is not
      very efficient in a parallel umount case as we might end up migrating the
      lock resource to a node which itself may have to migrate it to a third node.
      
      The patch scans the dlm->exit_domain_map to ensure the target node is not
      leaving the domain. If no valid target node is found, o2dlm does not migrate
      the resource but instead waits for the unlock and deref messages that will
      allow it to free the resource.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      66effd3c
  21. 24 5月, 2011 1 次提交
  22. 14 5月, 2011 1 次提交
  23. 31 3月, 2011 1 次提交
  24. 21 2月, 2011 1 次提交
    • T
      ocfs2: Remove ENTRY from masklog. · ef6b689b
      Tao Ma 提交于
      ENTRY is used to record the entry of a function.
      But because it is added in so many functions, if we enable it,
      the system logs get filled up quickly and cause too much I/O.
      So actually no one can open it for a production system or even
      for a test.
      
      So for mlog_entry_void, we just remove it.
      for mlog_entry(...), we replace it with mlog(0,...), and they
      will be replace by trace event later.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      ef6b689b
  25. 10 12月, 2010 1 次提交
    • S
      ocfs2/dlm: Migrate lockres with no locks if it has a reference · 388c4bcb
      Sunil Mushran 提交于
      o2dlm was not migrating resources with zero locks because it assumed that that
      resource would get purged by dlm_thread. However, some usage patterns involve
      creating and dropping locks at a high rate leading to the migrate thread seeing
      zero locks but the purge thread seeing an active reference. When this happens,
      the dlm_thread cannot purge the resource and the migrate thread sees no reason
      to migrate that resource. The spell is broken when the migrate thread catches
      the resource with a lock.
      
      The fix is to make the migrate thread also consider the reference map.
      
      This usage pattern can be triggered by userspace on userdlm locks and flocks.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      388c4bcb
  26. 24 9月, 2010 1 次提交
  27. 08 8月, 2010 2 次提交
    • W
      ocfs2/dlm: remove potential deadlock -V3 · b11f1f1a
      Wengang Wang 提交于
      When we need to take both dlm_domain_lock and dlm->spinlock, we should take
      them in order of: dlm_domain_lock then dlm->spinlock.
      
      There is pathes disobey this order. That is calling dlm_lockres_put() with
      dlm->spinlock held in dlm_run_purge_list. dlm_lockres_put() calls dlm_put() at
      the ref and dlm_put() locks on dlm_domain_lock.
      
      Fix:
      Don't grab/put the dlm when the initialising/releasing lockres.
      That grab is not required because we don't call dlm_unregister_domain()
      based on refcount.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      b11f1f1a
    • W
      ocfs2/dlm: fix a dead lock · 6d98c3cc
      Wengang Wang 提交于
      When we have to take both dlm->master_lock and lockres->spinlock,
      take them in order
      
      lockres->spinlock and then dlm->master_lock.
      
      The patch fixes a violation of the rule.
      We can simply move taking dlm->master_lock to where we have dropped res->spinlock
      since when we access res->state and free mle memory we don't need master_lock's
      protection.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      6d98c3cc
  28. 16 7月, 2010 1 次提交
  29. 19 5月, 2010 1 次提交
  30. 06 5月, 2010 1 次提交
  31. 24 3月, 2010 1 次提交
  32. 26 1月, 2010 1 次提交
  33. 04 12月, 2009 1 次提交