1. 24 12月, 2008 3 次提交
  2. 15 7月, 2008 2 次提交
  3. 22 4月, 2008 2 次提交
  4. 07 2月, 2008 1 次提交
  5. 06 2月, 2008 1 次提交
  6. 04 2月, 2008 7 次提交
  7. 31 1月, 2008 9 次提交
    • D
      dlm: keep cached master rsbs during recovery · 85f0379a
      David Teigland 提交于
      To prevent the master of an rsb from changing rapidly, an unused rsb is kept
      on the "toss list" for a period of time to be reused.  The toss list was
      being cleared completely for each recovery, which is unnecessary.  Much of
      the benefit of the toss list can be maintained if nodes keep rsb's in their
      toss list that they are the master of.  These rsb's need to be included
      when the resource directory is rebuilt during recovery.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      85f0379a
    • D
      dlm: change error message to debug · 594199eb
      David Teigland 提交于
      The invalid lockspace messages are normal and can appear relatively
      often.  They should be suppressed without debugging enabled.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      594199eb
    • D
      dlm: limit dir lookup loop · 755b5eb8
      David Teigland 提交于
      In a rare case we may need to repeat a local resource directory lookup
      due to a race with removing the rsb and removing the resdir record.
      We'll never need to do more than a single additional lookup, though,
      so the infinite loop around the lookup can be removed.  In addition
      to being unnecessary, the infinite loop is dangerous since some other
      unknown condition may appear causing the loop to never break.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      755b5eb8
    • D
      dlm: reject normal unlock when lock is waiting for lookup · 42dc1601
      David Teigland 提交于
      Non-forced unlocks should be rejected if the lock is waiting on the
      rsb_lookup list for another lock to establish the master node.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      42dc1601
    • D
      dlm: validate messages before processing · c54e04b0
      David Teigland 提交于
      There was some hit and miss validation of messages that has now been
      cleaned up and unified.  Before processing a message, the new
      validate_message() function checks that the lkb is the appropriate type,
      process-copy or master-copy, and that the message is from the correct
      nodeid for the the given lkb.  Other checks and assertions on the
      lkb type and nodeid have been removed.  The assertions were particularly
      bad since they would panic the machine instead of just ignoring the bad
      message.
      
      Although other recent patches have made processing old message unlikely,
      it still may be possible for an old message to be processed and caught
      by these checks.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      c54e04b0
    • D
      dlm: reject messages from non-members · 46b43eed
      David Teigland 提交于
      Messages from nodes that are no longer members of the lockspace should be
      ignored.  When nodes are removed from the lockspace, recovery can
      sometimes complete quickly enough that messages arrive from a removed node
      after recovery has completed.  When processed, these messages would often
      cause an error message, and could in some cases change some state, causing
      problems.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      46b43eed
    • D
      dlm: another call to confirm_master in receive_request_reply · aec64e1b
      David Teigland 提交于
      When a failed request (EBADR or ENOTBLK) is unlocked/canceled instead of
      retried, there may be other lkb's waiting on the rsb_lookup list for it
      to complete.  A call to confirm_master() is needed to move on to the next
      waiting lkb since the current one won't be retried.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      aec64e1b
    • D
      dlm: recover locks waiting for overlap replies · 601342ce
      David Teigland 提交于
      When recovery looks at locks waiting for replies, it fails to consider
      locks that have already received a reply for their first remote operation,
      but not received a reply for secondary, overlapping unlock/cancel.  The
      appropriate stub reply needs to be called for these waiters.
      
      Appears when we start doing recovery in the presence of a many overlapping
      unlock/cancel ops.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      601342ce
    • D
      dlm: clear ast_type when removing from astqueue · 8a358ca8
      David Teigland 提交于
      The lkb_ast_type field indicates whether the lkb is on the astqueue list.
      When clearing locks for a process, lkb's were being removed from the astqueue
      list without clearing the field.  If release_lockspace then happened
      immediately afterward, it could try to remove the lkb from the list a second
      time.
      
      Appears when process calls libdlm dlm_release_lockspace() which first
      closes the ls dev triggering clear_proc_locks, and then removes the ls
      (a write to control dev) causing release_lockspace().
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      8a358ca8
  8. 30 1月, 2008 3 次提交
  9. 10 10月, 2007 2 次提交
    • D
      [DLM] block dlm_recv in recovery transition · c36258b5
      David Teigland 提交于
      Introduce a per-lockspace rwsem that's held in read mode by dlm_recv
      threads while working in the dlm.  This allows dlm_recv activity to be
      suspended when the lockspace transitions to, from and between recovery
      cycles.
      
      The specific bug prompting this change is one where an in-progress
      recovery cycle is aborted by a new recovery cycle.  While dlm_recv was
      processing a recovery message, the recovery cycle was aborted and
      dlm_recoverd began cleaning up.  dlm_recv decremented recover_locks_count
      on an rsb after dlm_recoverd had reset it to zero.  This is fixed by
      suspending dlm_recv (taking write lock on the rwsem) before aborting the
      current recovery.
      
      The transitions to/from normal and recovery modes are simplified by using
      this new ability to block dlm_recv.  The switch from normal to recovery
      mode means dlm_recv goes from processing locking messages, to saving them
      for later, and vice versa.  Races are avoided by blocking dlm_recv when
      setting the flag that switches between modes.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c36258b5
    • P
      [DLM] don't overwrite castparam if it's NULL · b434eda6
      Patrick Caulfield 提交于
      If the castaddr passed to the userland API is NULL then don't overwrite the
      existing castparam. This allows a different thread to cancel a lock request and
      the CANCEL AST gets delivered to the original thread.
      
      bz#306391 (for RHEL4) refers.
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b434eda6
  10. 14 8月, 2007 1 次提交
    • D
      [DLM] fix basts for granted PR waiting CW · 36509258
      David Teigland 提交于
      Fix a long standing bug where a blocking callback would be missed
      when there's a granted lock in PR mode and waiting locks in both
      PR and CW modes (and the PR lock was added to the waiting queue
      before the CW lock).  The logic simply compared the numerical values
      of the modes to determine if a blocking callback was required, but in
      the one case of PR and CW, the lower valued CW mode blocks the higher
      valued PR mode.  We just need to add a special check for this PR/CW
      case in the tests that decide when a blocking callback is needed.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      36509258
  11. 09 7月, 2007 9 次提交