1. 09 7月, 2007 7 次提交
    • P
      [DLM] variable allocation · 44f487a5
      Patrick Caulfield 提交于
      Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace.
      This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL.
      (This updated version of the patch uses gfp_t for ls_allocation.)
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-Off-By: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      44f487a5
    • D
      [DLM] dumping master locks · 9dd592d7
      David Teigland 提交于
      Add a new debugfs file that dumps a compact list of mastered locks.
      This will be used by a userland daemon to collect state for deadlock
      detection.
      
      Also, for the existing function that prints all lock state, lock the rsb
      before going through the lock lists since they can be changing in the
      course of normal dlm activity.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9dd592d7
    • D
      [DLM] canceling deadlocked lock · 8b4021fa
      David Teigland 提交于
      Add a function that can be used through libdlm by a system daemon to cancel
      another process's deadlocked lock.  A completion ast with EDEADLK is returned
      to the process waiting for the lock.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8b4021fa
    • D
      [DLM] timeout fixes · 84d8cd69
      David Teigland 提交于
      Various fixes related to the new timeout feature:
      - add_timeout() missed setting TIMEWARN flag on lkb's when the
        TIMEOUT flag was already set
      - clear_proc_locks should remove a dead process's locks from the
        timeout list
      - the end-of-life calculation for user locks needs to consider that
        ETIMEDOUT is equivalent to -DLM_ECANCEL
      - make initial default timewarn_cs config value visible in configfs
      - change bit position of TIMEOUT_CANCEL flag so it's not copied to
        a remote master node
      - set timestamp on remote lkb's so a lock dump will display the time
        they've been waiting
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      84d8cd69
    • D
      [DLM] wait for config check during join [6/6] · 8b0e7b2c
      David Teigland 提交于
      Joining the lockspace should wait for the initial round of inter-node
      config checks to complete before returning.  This way, if there's a
      configuration mismatch between the joining node and the existing nodes,
      the join can fail and return an error to the application.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8b0e7b2c
    • D
      [DLM] dlm_device interface changes [3/6] · d7db923e
      David Teigland 提交于
      Change the user/kernel device interface used by libdlm:
      - Add ability for userspace to check the version of the interface.  libdlm
        can now adapt to different versions of the kernel interface.
      - Increase the size of the flags passed in a lock request so all possible
        flags can be used from userspace.
      - Add an opaque "xid" value for each lock.  This "transaction id" will be
        used later to associate locks with each other during deadlock detection.
      - Add a "timeout" value for each lock.  This is used along with the
        DLM_LKF_TIMEOUT flag.
      
      Also, remove a fragment of unused code in device_read().
      
      This patch requires updating libdlm which is backward compatible with
      older kernels.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d7db923e
    • D
      [DLM] add lock timeouts and warnings [2/6] · 3ae1acf9
      David Teigland 提交于
      New features: lock timeouts and time warnings.  If the DLM_LKF_TIMEOUT
      flag is set, then the request/conversion will be canceled after waiting
      the specified number of centiseconds (specified per lock).  This feature
      is only available for locks requested through libdlm (can be enabled for
      kernel dlm users if there's a use for it.)
      
      If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then
      a warning message will be sent to userspace (using genetlink) after a
      request/conversion has been waiting for a given number of centiseconds
      (configurable per node).  The time warnings will be used in the future
      to do deadlock detection in userspace.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3ae1acf9
  2. 01 5月, 2007 2 次提交
    • D
      [DLM] add orphan purging code (1/2) · 8499137d
      David Teigland 提交于
      Add code for purging orphan locks.  A process can also purge all of its
      own non-orphan locks by passing a pid of zero.  Code already exists for
      processes to create persistent locks that become orphans when the process
      exits, but the complimentary capability for another process to then purge
      these orphans has been missing.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8499137d
    • D
      [DLM] overlapping cancel and unlock · ef0c2bb0
      David Teigland 提交于
      Full cancel and force-unlock support.  In the past, cancel and force-unlock
      wouldn't work if there was another operation in progress on the lock.  Now,
      both cancel and unlock-force can overlap an operation on a lock, meaning there
      may be 2 or 3 operations in progress on a lock in parallel.  This support is
      important not only because cancel and force-unlock are explicit operations
      that an app can use, but both are used implicitly when a process exits while
      holding locks.
      
      Summary of changes:
      
      - add-to and remove-from waiters functions were rewritten to handle situations
        with more than one remote operation outstanding on a lock
      
      - validate_unlock_args detects when an overlapping cancel/unlock-force
        can be sent and when it needs to be delayed until a request/lookup
        reply is received
      
      - processing request/lookup replies detects when cancel/unlock-force
        occured during the op, and carries out the delayed cancel/unlock-force
      
      - manipulation of the "waiters" (remote operation) state of a lock moved under
        the standard rsb mutex that protects all the other lock state
      
      - the two recovery routines related to locks on the waiters list changed
        according to the way lkb's are now locked before accessing waiters state
      
      - waiters recovery detects when lkb's being recovered have overlapping
        cancel/unlock-force, and may not recover such locks
      
      - revert_lock (cancel) returns a value to distinguish cases where it did
        nothing vs cases where it actually did a cancel; the cancel completion ast
        should only be done when cancel did something
      
      - orphaned locks put on new list so they can be found later for purging
      
      - cancel must be called on a lock when making it an orphan
      
      - flag user locks (ENDOFLIFE) at the end of their useful life (to the
        application) so we can return an error for any further cancel/unlock-force
      
      - we weren't setting COMP/BAST ast flags if one was already set, so we'd lose
        either a completion or blocking ast
      
      - clear an unread bast on a lock that's become unlocked
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ef0c2bb0
  3. 06 2月, 2007 3 次提交
    • D
      [DLM] fix user unlocking · a1bc86e6
      David Teigland 提交于
      When a user process exits, we clear all the locks it holds.  There is a
      problem, though, with locks that the process had begun unlocking before it
      exited.  We couldn't find the lkb's that were in the process of being
      unlocked remotely, to flag that they are DEAD.  To solve this, we move
      lkb's being unlocked onto a new list in the per-process structure that
      tracks what locks the process is holding.  We can then go through this
      list to flag the necessary lkb's when clearing locks for a process when it
      exits.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a1bc86e6
    • D
      [DLM] add config entry to enable log_debug · 99fc6487
      David Teigland 提交于
      Add a new dlm_config_info field to enable log_debug output and change
      log_debug() to use it.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      99fc6487
    • D
      [DLM] fix old rcom messages · 38aa8b0c
      David Teigland 提交于
      A reply to a recovery message will often be received after the relevant
      recovery sequence has aborted and the next recovery sequence has begun.
      We need to ignore replies to these old messages from the previous
      recovery.  There's already a way to do this for synchronous recovery
      requests using the rc_id number, but not for async.
      
      Each recovery sequence already has a locally unique sequence number
      associated with it.  This patch adds a field to the rcom (recovery
      message) structure where this recovery sequence number can be placed,
      rc_seq.  When a node sends a reply to a recovery request, it copies the
      rc_seq number it received into rc_seq_reply.  When the first node receives
      the reply to its recovery message, it will check whether rc_seq_reply
      matches the current recovery sequence number, ls_recover_seq, and if not
      then it ignores the old reply.
      
      An old, inadequate approach to filtering out old replies (checking if the
      current stage of recovery has moved back to the start) has been removed
      from two spots.
      
      The protocol version number is changed to reflect the different rcom
      structures.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      38aa8b0c
  4. 30 11月, 2006 1 次提交
    • D
      [DLM] don't accept replies to old recovery messages · 98f176fb
      David Teigland 提交于
      We often abort a recovery after sending a status request to a remote node.
      We want to ignore any potential status reply we get from the remote node.
      If we get one of these unwanted replies, we've often moved on to the next
      recovery message and incremented the message sequence counter, so the
      reply will be ignored due to the seq number.  In some cases, we've not
      moved on to the next message so the seq number of the reply we want to
      ignore is still correct, causing the reply to be accepted.  The next
      recovery message will then mistake this old reply as a new one.
      
      To fix this, we add the flag RCOM_WAIT to indicate when we can accept a
      new reply.  We clear this flag if we abort recovery while waiting for a
      reply.  Before the flag is set again (to allow new replies) we know that
      any old replies will be rejected due to their sequence number.  We also
      initialize the recovery-message sequence number to a random value when a
      lockspace is first created.  This makes it clear when messages are being
      rejected from an old instance of a lockspace that has since been
      recreated.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      98f176fb
  5. 24 8月, 2006 1 次提交
  6. 10 8月, 2006 1 次提交
  7. 09 8月, 2006 1 次提交
  8. 26 7月, 2006 1 次提交
  9. 13 7月, 2006 1 次提交
  10. 03 5月, 2006 1 次提交
  11. 23 2月, 2006 1 次提交
  12. 20 1月, 2006 1 次提交
  13. 18 1月, 2006 1 次提交