1. 09 7月, 2007 1 次提交
    • D
      [DLM] add lock timeouts and warnings [2/6] · 3ae1acf9
      David Teigland 提交于
      New features: lock timeouts and time warnings.  If the DLM_LKF_TIMEOUT
      flag is set, then the request/conversion will be canceled after waiting
      the specified number of centiseconds (specified per lock).  This feature
      is only available for locks requested through libdlm (can be enabled for
      kernel dlm users if there's a use for it.)
      
      If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then
      a warning message will be sent to userspace (using genetlink) after a
      request/conversion has been waiting for a given number of centiseconds
      (configurable per node).  The time warnings will be used in the future
      to do deadlock detection in userspace.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3ae1acf9
  2. 06 2月, 2007 1 次提交
  3. 30 11月, 2006 5 次提交
    • R
      [DLM] fix format warnings in rcom.c and recoverd.c · 57adf7ee
      Ryusuke Konishi 提交于
      This fixes the following gcc warnings generated on
      the architectures where uint64_t != unsigned long long (e.g. ppc64).
      
      fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'uint64_t'
      fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'uint64_t'
      fs/dlm/recoverd.c:48: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
      fs/dlm/recoverd.c:202: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
      fs/dlm/recoverd.c:210: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
      Signed-off-by: NRyusuke Konishi <ryusuke@osrg.net>
      Signed-off-by: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      57adf7ee
    • D
      [DLM] fix add_requestqueue checking nodes list · 2896ee37
      David Teigland 提交于
      Requests that arrive after recovery has started are saved in the
      requestqueue and processed after recovery is done.  Some of these requests
      are purged during recovery if they are from nodes that have been removed.
      We move the purging of the requests (dlm_purge_requestqueue) to later in
      the recovery sequence which allows the routine saving requests
      (dlm_add_requestqueue) to avoid filtering out requests by nodeid since the
      same will be done by the purge.  The current code has add_requestqueue
      filtering by nodeid but doesn't hold any locks when accessing the list of
      current nodes.  This also means that we need to call the purge routine
      when the lockspace is being shut down since the add routine will not be
      rejecting requests itself any more.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2896ee37
    • D
      [DLM] do full recover_locks barrier · 4b77f2c9
      David Teigland 提交于
      Red Hat BZ 211914
      
      The previous patch "[DLM] fix aborted recovery during
      node removal" was incomplete as discovered with further testing.  It set
      the bit for the RS_LOCKS barrier but did not then wait for the barrier.
      This is often ok, but sometimes it will cause yet another recovery hang.
      If it's a new node that also has the lowest nodeid that skips the barrier
      wait, then it misses the important step of collecting and reporting the
      barrier status from the other nodes (which is the job of the low nodeid in
      the barrier wait routine).
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4b77f2c9
    • D
      [DLM] fix stopping unstarted recovery · 2cdc98aa
      David Teigland 提交于
      Red Hat BZ 211914
      
      When many nodes are joining a lockspace simultaneously, the dlm gets a
      quick sequence of stop/start events, a pair for adding each node.
      dlm_controld in user space sends dlm_recoverd in the kernel each stop and
      start event.  dlm_controld will sometimes send the stop before
      dlm_recoverd has had a chance to take up the previously queued start.  The
      stop aborts the processing of the previous start by setting the
      RECOVERY_STOP flag.  dlm_recoverd is erroneously clearing this flag and
      ignoring the stop/abort if it happens to take up the start after the stop
      meant to abort it.  The fix is to check the sequence number that's
      incremented for each stop/start before clearing the flag.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2cdc98aa
    • D
      [DLM] fix aborted recovery during node removal · 91c0dc93
      David Teigland 提交于
      Red Hat BZ 211914
      
      With the new cluster infrastructure, dlm recovery for a node removal can
      be aborted and restarted for a node addition.  When this happens, the
      restarted recovery isn't aware that it's doing recovery for the earlier
      removal as well as the addition.  So, it then skips the recovery steps
      only required when nodes are removed.  This can result in locks not being
      purged for failed/removed nodes.  The fix is to check for removed nodes
      for which recovery has not been completed at the start of a new recovery
      sequence.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      91c0dc93
  4. 25 8月, 2006 1 次提交
    • D
      [DLM] add new lockspace to list ealier · 5f88f1ea
      David Teigland 提交于
      When a new lockspace was being created, the recoverd thread was being
      started for it before the lockspace was added to the global list of
      lockspaces.  The new thread was looking up the lockspace in the global
      list and sometimes not finding it due to the race with the original thread
      adding it to the list.  We need to add the lockspace to the global list
      before starting the thread instead of after, and if the new thread can't
      find the lockspace for some reason, it should return an error.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5f88f1ea
  5. 09 8月, 2006 1 次提交
  6. 20 1月, 2006 1 次提交
  7. 18 1月, 2006 1 次提交