1. 30 12月, 2015 1 次提交
    • X
      ocfs2/dlm: clear migration_pending when migration target goes down · cc28d6d8
      xuejiufei 提交于
      We have found a BUG on res->migration_pending when migrating lock
      resources.  The situation is as follows.
      
      dlm_mark_lockres_migration
        res->migration_pending = 1;
        __dlm_lockres_reserve_ast
        dlm_lockres_release_ast returns with res->migration_pending remains
            because other threads reserve asts
        wait dlm_migration_can_proceed returns 1
        >>>>>>> o2hb found that target goes down and remove target
                from domain_map
        dlm_migration_can_proceed returns 1
        dlm_mark_lockres_migrating returns -ESHOTDOWN with
            res->migration_pending still remains.
      
      When reentering dlm_mark_lockres_migrating(), it will trigger the BUG_ON
      with res->migration_pending.  So clear migration_pending when target is
      down.
      Signed-off-by: NJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc28d6d8
  2. 06 11月, 2015 1 次提交
  3. 23 10月, 2015 1 次提交
  4. 23 9月, 2015 1 次提交
  5. 12 9月, 2015 1 次提交
  6. 05 9月, 2015 5 次提交
  7. 25 6月, 2015 1 次提交
  8. 06 5月, 2015 1 次提交
    • J
      ocfs2: dlm: fix race between purge and get lock resource · b1432a2a
      Junxiao Bi 提交于
      There is a race window in dlm_get_lock_resource(), which may return a
      lock resource which has been purged.  This will cause the process to
      hang forever in dlmlock() as the ast msg can't be handled due to its
      lock resource not existing.
      
          dlm_get_lock_resource {
              ...
              spin_lock(&dlm->spinlock);
              tmpres = __dlm_lookup_lockres_full(dlm, lockid, namelen, hash);
              if (tmpres) {
                   spin_unlock(&dlm->spinlock);
                   >>>>>>>> race window, dlm_run_purge_list() may run and purge
                                    the lock resource
                   spin_lock(&tmpres->spinlock);
                   ...
                   spin_unlock(&tmpres->spinlock);
              }
          }
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1432a2a
  9. 11 2月, 2015 4 次提交
  10. 09 1月, 2015 1 次提交
  11. 19 12月, 2014 1 次提交
  12. 11 12月, 2014 3 次提交
  13. 10 10月, 2014 5 次提交
  14. 03 10月, 2014 1 次提交
  15. 26 9月, 2014 1 次提交
  16. 07 8月, 2014 2 次提交
  17. 24 6月, 2014 4 次提交
    • X
      ocfs2/dlm: do not purge lockres that is queued for assert master · ac4fef4d
      Xue jiufei 提交于
      When workqueue is delayed, it may occur that a lockres is purged while it
      is still queued for master assert.  it may trigger BUG() as follows.
      
      N1                                         N2
      dlm_get_lockres()
      ->dlm_do_master_requery
                                        is the master of lockres,
                                        so queue assert_master work
      
                                        dlm_thread() start running
                                        and purge the lockres
      
                                        dlm_assert_master_worker()
                                        send assert master message
                                        to other nodes
      receiving the assert_master
      message, set master to N2
      
      dlmlock_remote() send create_lock message to N2, but receive DLM_IVLOCKID,
      if it is RECOVERY lockres, it triggers the BUG().
      
      Another BUG() is triggered when N3 become the new master and send
      assert_master to N1, N1 will trigger the BUG() because owner doesn't
      match.  So we should not purge lockres when it is queued for assert
      master.
      Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac4fef4d
    • J
      ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umount · b9aaac5a
      jiangyiwen 提交于
      The following case may lead to endless loop during umount.
      
      node A         node B               node C       node D
      umount volume,
      migrate lockres1
      to B
                                                       want to lock lockres1,
                                                       send
                                                       MASTER_REQUEST_MSG
                                                       to C
                                          init block mle
                     send
                     MIGRATE_REQUEST_MSG
                     to C
                                          find a block
                                          mle, and then
                                          return
                                          DLM_MIGRATE_RESPONSE_MASTERY_REF
                                          to B
                     set C in refmap
                                          umount successfully
                     try to umount, endless
                     loop occurs when migrate
                     lockres1 since C is in
                     refmap
      
      So we can fix this endless loop case by only returning
      DLM_MIGRATE_RESPONSE_MASTERY_REF if it has a mastery mle when receiving
      MIGRATE_REQUEST_MSG.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: Njiangyiwen <jiangyiwen@huawei.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Xue jiufei <xuejiufei@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9aaac5a
    • X
      ocfs2/dlm: fix misuse of list_move_tail() in dlm_run_purge_list() · a270c6d3
      Xue jiufei 提交于
      When a lockres in purge list but is still in use, it should be moved to
      the tail of purge list.  dlm_thread will continue to check next lockres in
      purge list.  However, code list_move_tail(&dlm->purge_list,
      &lockres->purge) will do *no* movements, so dlm_thread will purge the same
      lockres in this loop again and again.  If it is in use for a long time,
      other lockres will not be processed.
      Signed-off-by: NYiwen Jiang <jiangyiwen@huawei.com>
      Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a270c6d3
    • T
      ocfs2: fix deadlock when two nodes are converting same lock from PR to EX and... · 27bf6305
      Tariq Saeed 提交于
      ocfs2: fix deadlock when two nodes are converting same lock from PR to EX and idletimeout closes conn
      
      Orabug: 18639535
      
      Two node cluster and both nodes hold a lock at PR level and both want to
      convert to EX at the same time.  Master node 1 has sent BAST and then
      closes the connection due to idletime out.  Node 0 receives BAST, sends
      unlock req with cancel flag but gets error -ENOTCONN.  The problem is
      this error is ignored in dlm_send_remote_unlock_request() on the
      **incorrect** assumption that the master is dead.  See NOTE in comment
      why it returns DLM_NORMAL.  Upon getting DLM_NORMAL, node 0 proceeds to
      sends convert (without cancel flg) which fails with -ENOTCONN.  waits 5
      sec and resends.
      
      This time gets DLM_IVLOCKID from the master since lock not found in
      grant, it had been moved to converting queue in response to conv PR->EX
      req.  No way out.
      
      Node 1 (master)				Node 0
      ==============				======
      
        lock mode PR				PR
      
        convert PR -> EX
        mv grant -> convert and que BAST
        ...
                           <-------- convert PR -> EX
        convert que looks like this: ((node 1, PR -> EX) (node 0, PR -> EX))
        ...
                              BAST (want PR -> NL)
                           ------------------>
        ...
        idle timout, conn closed
                                      ...
                                      In response to BAST,
                                      sends unlock with cancel convert flag
                                      gets -ENOTCONN. Ignores and
                                      sends remote convert request
                                      gets -ENOTCONN, waits 5 Sec, retries
        ...
        reconnects
                         <----------------- convert req goes through on next try
        does not find lock on grant que
                         status DLM_IVLOCKID
                         ------------------>
        ...
      
      No way out.  Fix is to keep retrying unlock with cancel flag until it
      succeeds or the master dies.
      Signed-off-by: NTariq Saeed <tariq.x.saeed@oracle.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27bf6305
  18. 05 6月, 2014 4 次提交
    • X
      ocfs2: remove some unused code · e72db989
      Xue jiufei 提交于
      dlm_recovery_ctxt.received is unused.
      
      ocfs2_should_refresh_lock_res() can only return 0 or 1, so the error
      handling code in ocfs2_super_lock() is unneeded.
      Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e72db989
    • X
      ocfs2/dlm: disallow node joining when recovery is on going · 01c6222f
      Xue jiufei 提交于
      We found a race situation when dlm recovery and node joining occurs
      simultaneously if the network state is bad.
      
      N1                                      N4
      
      start joining dlm and send
      query join to all live nodes
                                  set joining node to N1, return OK
      send query join to other
      live nodes and it may take
      a while
      
      call dlm_send_join_assert()
      to send assert join message
      when N2 is down, so keep
      trying to send message to N2
      until find N2 is down
      
      send assert join message to
      N3, but connection is down
      with N3, so it may take a
      while
                                  become the recovery master for N2
                                  and send begin reco message to other
                                  nodes in domain map but no N1
      connection with N3 is rebuild,
      then send assert join to N4
                                  call dlm_assert_joined_handler(),
                                  add N1 to domain_map
      
                                  dlm recovery done, send finalize message
                                  to nodes in domain map, including N1
      receiving finalize message,
      trigger the BUG() because
      recovery master mismatch.
      Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01c6222f
    • X
      ocfs2/dlm: fix possible convert=sion deadlock · 6718cb5e
      Xue jiufei 提交于
      We found there is a conversion deadlock when the owner of lockres
      happened to crash before send DLM_PROXY_AST_MSG for a downconverting
      lock.  The situation is as follows:
      
      Node1                            Node2                  Node3
                                 the owner of lockresA
      lock_1 granted at EX mode
      and call ocfs2_cluster_unlock
      to decrease ex_holders.
                                                       converting lock_3 from
                                                       NL to EX
                                 send DLM_PROXY_AST_MSG
                                 to Node1, asking Node 1
                                 to downconvert.
      receiving DLM_PROXY_AST_MSG,
      thread ocfs2dc send
      DLM_CONVERT_LOCK_MSG
      to Node2 to downconvert
      lock_1(EX->NL).
                                 lock_1 can be granted and
                                 put it into pending_asts
                                 list, return DLM_NORMAL.
                                 then something happened
                                 and Node2 crashed.
      received DLM_NORMAL, waiting
      for DLM_PROXY_AST_MSG.
                                                     selected as the recovery
                                                     master, receving migrate
                                                     lock from Node1, queue
                                                     lock_1 to the tail of
                                                     converting list.
      
      After dlm recovery, converting list in the master of lockresA(Node3)
      will be: converting list head <-> lock_3(NL->EX) <->lock_1(EX<->NL).
      Requested mode of lock_3 is not compatible with the granted mode of
      lock_1, so it can not be granted.  and lock_1 can not downconvert
      because covnerting queue is strictly FIFO.  So a deadlock is created.
      We think function dlm_process_recovery_data() should queue_ast for
      lock_1 or alter the order of lock_1 and lock_3, so dlm_thread can
      process lock_1 first.  And if there are multiple downconverting locks,
      they must convert form PR to NL, so no need to sort them.
      Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6718cb5e
    • F
      ocfs2: remove NULL assignments on static · 1a5c4e2a
      Fabian Frederick 提交于
      Static values are automatically initialized to NULL.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1a5c4e2a
  19. 24 5月, 2014 1 次提交
  20. 04 4月, 2014 1 次提交
新手
引导
客服 返回
顶部