1. 03 5月, 2012 1 次提交
    • D
      dlm: fixes for nodir mode · 4875647a
      David Teigland 提交于
      The "nodir" mode (statically assign master nodes instead
      of using the resource directory) has always been highly
      experimental, and never seriously used.  This commit
      fixes a number of problems, making nodir much more usable.
      
      - Major change to recovery: recover all locks and restart
        all in-progress operations after recovery.  In some
        cases it's not possible to know which in-progess locks
        to recover, so recover all.  (Most require recovery
        in nodir mode anyway since rehashing changes most
        master nodes.)
      
      - Change the way nodir mode is enabled, from a command
        line mount arg passed through gfs2, into a sysfs
        file managed by dlm_controld, consistent with the
        other config settings.
      
      - Allow recovering MSTCPY locks on an rsb that has not
        yet been turned into a master copy.
      
      - Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
        from a previous, aborted recovery cycle.  Base this
        on the local recovery status not being in the state
        where any nodes should be sending LOCK messages for the
        current recovery cycle.
      
      - Hold rsb lock around dlm_purge_mstcpy_locks() because it
        may run concurrently with dlm_recover_master_copy().
      
      - Maintain highbast on process-copy lkb's (in addition to
        the master as is usual), because the lkb can switch
        back and forth between being a master and being a
        process copy as the master node changes in recovery.
      
      - When recovering MSTCPY locks, flag rsb's that have
        non-empty convert or waiting queues for granting
        at the end of recovery.  (Rename flag from LOCKS_PURGED
        to RECOVER_GRANT and similar for the recovery function,
        because it's not only resources with purged locks
        that need grant a grant attempt.)
      
      - Replace a couple of unnecessary assertion panics with
        error messages.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      4875647a
  2. 04 1月, 2012 2 次提交
    • D
      dlm: add recovery callbacks · 60f98d18
      David Teigland 提交于
      These new callbacks notify the dlm user about lock recovery.
      GFS2, and possibly others, need to be aware of when the dlm
      will be doing lock recovery for a failed lockspace member.
      
      In the past, this coordination has been done between dlm and
      file system daemons in userspace, which then direct their
      kernel counterparts.  These callbacks allow the same
      coordination directly, and more simply.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      60f98d18
    • D
      dlm: add node slots and generation · 757a4271
      David Teigland 提交于
      Slot numbers are assigned to nodes when they join the lockspace.
      The slot number chosen is the minimum unused value starting at 1.
      Once a node is assigned a slot, that slot number will not change
      while the node remains a lockspace member.  If the node leaves
      and rejoins it can be assigned a new slot number.
      
      A new generation number is also added to a lockspace.  It is
      set and incremented during each recovery along with the slot
      collection/assignment.
      
      The slot numbers will be passed to gfs2 which will use them as
      journal id's.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      757a4271
  3. 19 11月, 2011 1 次提交
  4. 16 7月, 2011 1 次提交
  5. 13 7月, 2011 1 次提交
    • D
      dlm: improve rsb searches · 3881ac04
      David Teigland 提交于
      By pre-allocating rsb structs before searching the hash
      table, they can be inserted immediately.  This avoids
      always having to repeat the search when adding the struct
      to hash list.
      
      This also adds space to the rsb struct for a max resource
      name, so an rsb allocation can be used by any request.
      The constant size also allows us to finally use a slab
      for the rsb structs.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      3881ac04
  6. 11 7月, 2011 1 次提交
  7. 02 7月, 2011 1 次提交
  8. 02 4月, 2011 1 次提交
  9. 08 3月, 2010 1 次提交
  10. 27 2月, 2010 1 次提交
  11. 01 12月, 2009 1 次提交
    • D
      dlm: always use GFP_NOFS · 573c24c4
      David Teigland 提交于
      Replace all GFP_KERNEL and ls_allocation with GFP_NOFS.
      ls_allocation would be GFP_KERNEL for userland lockspaces
      and GFP_NOFS for file system lockspaces.
      
      It was discovered that any lockspaces on the system can
      affect all others by triggering memory reclaim in the
      file system which could in turn call back into the dlm
      to acquire locks, deadlocking dlm threads that were
      shared by all lockspaces, like dlm_recv.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      573c24c4
  12. 07 5月, 2009 2 次提交
  13. 29 1月, 2009 1 次提交
  14. 09 1月, 2009 1 次提交
  15. 14 11月, 2008 1 次提交
    • D
      dlm: fix shutdown cleanup · 278afcbf
      David Teigland 提交于
      Fixes a regression from commit 0f8e0d9a,
      "dlm: allow multiple lockspace creates".
      
      An extraneous 'else' slipped into a code fragment being moved from
      release_lockspace() to dlm_release_lockspace().  The result of the
      unwanted 'else' is that dlm threads and structures are not stopped
      and cleaned up when the final dlm lockspace is removed.  Trying to
      create a new lockspace again afterward will fail with
      "kmem_cache_create: duplicate cache dlm_conn" because the cache
      was not previously destroyed.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      278afcbf
  16. 29 8月, 2008 3 次提交
    • D
      dlm: fix locking of lockspace list in dlm_scand · c1dcf65f
      David Teigland 提交于
      The dlm_scand thread needs to lock the list of lockspaces
      when going through it.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      c1dcf65f
    • D
      dlm: detect available userspace daemon · dc68c7ed
      David Teigland 提交于
      If dlm_controld (the userspace daemon that controls the setup and
      recovery of the dlm) fails, the kernel should shut down the lockspaces
      in the kernel rather than leaving them running.  This is detected by
      having dlm_controld hold a misc device open while running, and if
      the kernel detects a close while the daemon is still needed, it stops
      the lockspaces in the kernel.
      
      Knowing that the userspace daemon isn't running also allows the
      lockspace create/remove routines to avoid waiting on the daemon
      for join/leave operations.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      dc68c7ed
    • D
      dlm: allow multiple lockspace creates · 0f8e0d9a
      David Teigland 提交于
      Add a count for lockspace create and release so that create can
      be called multiple times to use the lockspace from different places.
      Also add the new flag DLM_LSFL_NEWEXCL to create a lockspace with
      the previous behavior of returning -EEXIST if the lockspace already
      exists.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      0f8e0d9a
  17. 30 4月, 2008 1 次提交
  18. 07 2月, 2008 1 次提交
  19. 30 1月, 2008 2 次提交
  20. 25 1月, 2008 6 次提交
  21. 13 10月, 2007 1 次提交
  22. 10 10月, 2007 1 次提交
    • D
      [DLM] block dlm_recv in recovery transition · c36258b5
      David Teigland 提交于
      Introduce a per-lockspace rwsem that's held in read mode by dlm_recv
      threads while working in the dlm.  This allows dlm_recv activity to be
      suspended when the lockspace transitions to, from and between recovery
      cycles.
      
      The specific bug prompting this change is one where an in-progress
      recovery cycle is aborted by a new recovery cycle.  While dlm_recv was
      processing a recovery message, the recovery cycle was aborted and
      dlm_recoverd began cleaning up.  dlm_recv decremented recover_locks_count
      on an rsb after dlm_recoverd had reset it to zero.  This is fixed by
      suspending dlm_recv (taking write lock on the rwsem) before aborting the
      current recovery.
      
      The transitions to/from normal and recovery modes are simplified by using
      this new ability to block dlm_recv.  The switch from normal to recovery
      mode means dlm_recv goes from processing locking messages, to saving them
      for later, and vice versa.  Races are avoided by blocking dlm_recv when
      setting the flag that switches between modes.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c36258b5
  23. 09 7月, 2007 6 次提交
  24. 03 5月, 2007 1 次提交
  25. 01 5月, 2007 1 次提交
    • D
      [DLM] overlapping cancel and unlock · ef0c2bb0
      David Teigland 提交于
      Full cancel and force-unlock support.  In the past, cancel and force-unlock
      wouldn't work if there was another operation in progress on the lock.  Now,
      both cancel and unlock-force can overlap an operation on a lock, meaning there
      may be 2 or 3 operations in progress on a lock in parallel.  This support is
      important not only because cancel and force-unlock are explicit operations
      that an app can use, but both are used implicitly when a process exits while
      holding locks.
      
      Summary of changes:
      
      - add-to and remove-from waiters functions were rewritten to handle situations
        with more than one remote operation outstanding on a lock
      
      - validate_unlock_args detects when an overlapping cancel/unlock-force
        can be sent and when it needs to be delayed until a request/lookup
        reply is received
      
      - processing request/lookup replies detects when cancel/unlock-force
        occured during the op, and carries out the delayed cancel/unlock-force
      
      - manipulation of the "waiters" (remote operation) state of a lock moved under
        the standard rsb mutex that protects all the other lock state
      
      - the two recovery routines related to locks on the waiters list changed
        according to the way lkb's are now locked before accessing waiters state
      
      - waiters recovery detects when lkb's being recovered have overlapping
        cancel/unlock-force, and may not recover such locks
      
      - revert_lock (cancel) returns a value to distinguish cases where it did
        nothing vs cases where it actually did a cancel; the cancel completion ast
        should only be done when cancel did something
      
      - orphaned locks put on new list so they can be found later for purging
      
      - cancel must be called on a lock when making it an orphan
      
      - flag user locks (ENDOFLIFE) at the end of their useful life (to the
        application) so we can return an error for any further cancel/unlock-force
      
      - we weren't setting COMP/BAST ast flags if one was already set, so we'd lose
        either a completion or blocking ast
      
      - clear an unread bast on a lock that's become unlocked
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ef0c2bb0
新手
引导
客服 返回
顶部