1. 17 11月, 2012 1 次提交
    • D
      dlm: fix lvb invalidation conditions · da8c6663
      David Teigland 提交于
      When a node is removed that held a PW/EX lock, the
      existing master node should invalidate the lvb on the
      resource due to the purged lock.
      
      Previously, the existing master node was invalidating
      the lvb if it found only NL/CR locks on the resource
      during recovery for the removed node.  This could lead
      to cases where it invalidated the lvb and shouldn't
      have, or cases where it should have invalidated and
      didn't.
      
      When recovery selects a *new* master node for a
      resource, and that new master finds only NL/CR locks
      on the resource after lock recovery, it should
      invalidate the lvb.  This case was handled correctly
      (but was incorrectly applied to the existing master
      case also.)
      
      When a process exits while holding a PW/EX lock,
      the lvb on the resource should be invalidated.
      This was not happening.
      
      The lvb contents and VALNOTVALID flag should be
      recovered before granting locks in recovery so that
      the recovered lvb state is provided in the callback.
      The lvb was being recovered after the lock was granted.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      da8c6663
  2. 17 7月, 2012 4 次提交
  3. 03 5月, 2012 1 次提交
    • D
      dlm: fixes for nodir mode · 4875647a
      David Teigland 提交于
      The "nodir" mode (statically assign master nodes instead
      of using the resource directory) has always been highly
      experimental, and never seriously used.  This commit
      fixes a number of problems, making nodir much more usable.
      
      - Major change to recovery: recover all locks and restart
        all in-progress operations after recovery.  In some
        cases it's not possible to know which in-progess locks
        to recover, so recover all.  (Most require recovery
        in nodir mode anyway since rehashing changes most
        master nodes.)
      
      - Change the way nodir mode is enabled, from a command
        line mount arg passed through gfs2, into a sysfs
        file managed by dlm_controld, consistent with the
        other config settings.
      
      - Allow recovering MSTCPY locks on an rsb that has not
        yet been turned into a master copy.
      
      - Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
        from a previous, aborted recovery cycle.  Base this
        on the local recovery status not being in the state
        where any nodes should be sending LOCK messages for the
        current recovery cycle.
      
      - Hold rsb lock around dlm_purge_mstcpy_locks() because it
        may run concurrently with dlm_recover_master_copy().
      
      - Maintain highbast on process-copy lkb's (in addition to
        the master as is usual), because the lkb can switch
        back and forth between being a master and being a
        process copy as the master node changes in recovery.
      
      - When recovering MSTCPY locks, flag rsb's that have
        non-empty convert or waiting queues for granting
        at the end of recovery.  (Rename flag from LOCKS_PURGED
        to RECOVER_GRANT and similar for the recovery function,
        because it's not only resources with purged locks
        that need grant a grant attempt.)
      
      - Replace a couple of unnecessary assertion panics with
        error messages.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      4875647a
  4. 04 1月, 2012 2 次提交
    • D
      dlm: add node slots and generation · 757a4271
      David Teigland 提交于
      Slot numbers are assigned to nodes when they join the lockspace.
      The slot number chosen is the minimum unused value starting at 1.
      Once a node is assigned a slot, that slot number will not change
      while the node remains a lockspace member.  If the node leaves
      and rejoins it can be assigned a new slot number.
      
      A new generation number is also added to a lockspace.  It is
      set and incremented during each recovery along with the slot
      collection/assignment.
      
      The slot numbers will be passed to gfs2 which will use them as
      journal id's.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      757a4271
    • D
      dlm: move recovery barrier calls · f95a34c6
      David Teigland 提交于
      Put all the calls to recovery barriers in the same function
      to clarify where they each happen.  Should not change any behavior.
      Also modify some recovery debug lines to make them consistent.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      f95a34c6
  5. 19 11月, 2011 1 次提交
  6. 31 3月, 2011 1 次提交
  7. 09 1月, 2009 1 次提交
  8. 04 2月, 2008 1 次提交
  9. 31 1月, 2008 1 次提交
    • D
      dlm: keep cached master rsbs during recovery · 85f0379a
      David Teigland 提交于
      To prevent the master of an rsb from changing rapidly, an unused rsb is kept
      on the "toss list" for a period of time to be reused.  The toss list was
      being cleared completely for each recovery, which is unnecessary.  Much of
      the benefit of the toss list can be maintained if nodes keep rsb's in their
      toss list that they are the master of.  These rsb's need to be included
      when the resource directory is rebuilt during recovery.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      85f0379a
  10. 30 1月, 2008 1 次提交
  11. 06 2月, 2007 2 次提交
    • D
      [DLM] fix master recovery · 222d3960
      David Teigland 提交于
      If master recovery happens on an rsb in one recovery sequence, then that
      sequence is aborted before lock recovery happens, then in the next
      sequence, we rely on the previous master recovery (which may now be
      invalid due to another node ignoring a lookup result) and go on do to the
      lock recovery where we get stuck due to an invalid master value.
      
       recovery cycle begins: master of rsb X has left
       nodes A and B send node C an rcom lookup for X to find the new master
       C gets lookup from B first, sets B as new master, and sends reply back to B
       C gets lookup from A next, and sends reply back to A saying B is master
       A gets lookup reply from C and sets B as the new master in the rsb
       recovery cycle on A, B and C is aborted to start a new recovery
       B gets lookup reply from C and ignores it since there's a new recovery
       recovery cycle begins: some other node has joined
       B doesn't think it's the master of X so it doesn't rebuild it in the directory
       C looks up the master of X, no one is master, so it becomes new master
       B looks up the master of X, finds it's C
       A believes that B is the master of X, so it sends its lock to B
       B sends an error back to A
       A resends
       this repeats forever, the incorrect master value on A is never corrected
      
      The fix is to do master recovery on an rsb that still has the NEW_MASTER
      flag set from an earlier recovery sequence, and therefore didn't complete
      lock recovery.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      222d3960
    • D
      [DLM] rename dlm_config_info fields · 68c817a1
      David Teigland 提交于
      Add a "ci_" prefix to the fields in the dlm_config_info struct so that we
      can use macros to add configfs functions to access them (in a later
      patch).  No functional changes in this patch, just naming changes.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      68c817a1
  12. 30 11月, 2006 1 次提交
  13. 24 8月, 2006 1 次提交
  14. 21 8月, 2006 1 次提交
  15. 26 7月, 2006 1 次提交
  16. 24 5月, 2006 1 次提交
  17. 20 1月, 2006 1 次提交
  18. 18 1月, 2006 1 次提交