1. 09 7月, 2007 4 次提交
  2. 03 5月, 2007 1 次提交
  3. 01 5月, 2007 1 次提交
    • D
      [DLM] overlapping cancel and unlock · ef0c2bb0
      David Teigland 提交于
      Full cancel and force-unlock support.  In the past, cancel and force-unlock
      wouldn't work if there was another operation in progress on the lock.  Now,
      both cancel and unlock-force can overlap an operation on a lock, meaning there
      may be 2 or 3 operations in progress on a lock in parallel.  This support is
      important not only because cancel and force-unlock are explicit operations
      that an app can use, but both are used implicitly when a process exits while
      holding locks.
      
      Summary of changes:
      
      - add-to and remove-from waiters functions were rewritten to handle situations
        with more than one remote operation outstanding on a lock
      
      - validate_unlock_args detects when an overlapping cancel/unlock-force
        can be sent and when it needs to be delayed until a request/lookup
        reply is received
      
      - processing request/lookup replies detects when cancel/unlock-force
        occured during the op, and carries out the delayed cancel/unlock-force
      
      - manipulation of the "waiters" (remote operation) state of a lock moved under
        the standard rsb mutex that protects all the other lock state
      
      - the two recovery routines related to locks on the waiters list changed
        according to the way lkb's are now locked before accessing waiters state
      
      - waiters recovery detects when lkb's being recovered have overlapping
        cancel/unlock-force, and may not recover such locks
      
      - revert_lock (cancel) returns a value to distinguish cases where it did
        nothing vs cases where it actually did a cancel; the cancel completion ast
        should only be done when cancel did something
      
      - orphaned locks put on new list so they can be found later for purging
      
      - cancel must be called on a lock when making it an orphan
      
      - flag user locks (ENDOFLIFE) at the end of their useful life (to the
        application) so we can return an error for any further cancel/unlock-force
      
      - we weren't setting COMP/BAST ast flags if one was already set, so we'd lose
        either a completion or blocking ast
      
      - clear an unread bast on a lock that's become unlocked
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ef0c2bb0
  4. 06 2月, 2007 1 次提交
  5. 30 11月, 2006 2 次提交
    • D
      [DLM] don't accept replies to old recovery messages · 98f176fb
      David Teigland 提交于
      We often abort a recovery after sending a status request to a remote node.
      We want to ignore any potential status reply we get from the remote node.
      If we get one of these unwanted replies, we've often moved on to the next
      recovery message and incremented the message sequence counter, so the
      reply will be ignored due to the seq number.  In some cases, we've not
      moved on to the next message so the seq number of the reply we want to
      ignore is still correct, causing the reply to be accepted.  The next
      recovery message will then mistake this old reply as a new one.
      
      To fix this, we add the flag RCOM_WAIT to indicate when we can accept a
      new reply.  We clear this flag if we abort recovery while waiting for a
      reply.  Before the flag is set again (to allow new replies) we know that
      any old replies will be rejected due to their sequence number.  We also
      initialize the recovery-message sequence number to a random value when a
      lockspace is first created.  This makes it clear when messages are being
      rejected from an old instance of a lockspace that has since been
      recreated.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      98f176fb
    • D
      [DLM] fix add_requestqueue checking nodes list · 2896ee37
      David Teigland 提交于
      Requests that arrive after recovery has started are saved in the
      requestqueue and processed after recovery is done.  Some of these requests
      are purged during recovery if they are from nodes that have been removed.
      We move the purging of the requests (dlm_purge_requestqueue) to later in
      the recovery sequence which allows the routine saving requests
      (dlm_add_requestqueue) to avoid filtering out requests by nodeid since the
      same will be done by the purge.  The current code has add_requestqueue
      filtering by nodeid but doesn't hold any locks when accessing the list of
      current nodes.  This also means that we need to call the purge routine
      when the lockspace is being shut down since the add routine will not be
      rejecting requests itself any more.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2896ee37
  6. 06 11月, 2006 2 次提交
  7. 07 9月, 2006 1 次提交
  8. 25 8月, 2006 1 次提交
    • D
      [DLM] add new lockspace to list ealier · 5f88f1ea
      David Teigland 提交于
      When a new lockspace was being created, the recoverd thread was being
      started for it before the lockspace was added to the global list of
      lockspaces.  The new thread was looking up the lockspace in the global
      list and sometimes not finding it due to the race with the original thread
      adding it to the list.  We need to add the lockspace to the global list
      before starting the thread instead of after, and if the new thread can't
      find the lockspace for some reason, it should return an error.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5f88f1ea
  9. 09 8月, 2006 1 次提交
  10. 26 7月, 2006 1 次提交
  11. 13 7月, 2006 1 次提交
  12. 28 4月, 2006 1 次提交
  13. 20 1月, 2006 1 次提交
  14. 18 1月, 2006 1 次提交