1. 15 10月, 2014 1 次提交
  2. 09 8月, 2012 1 次提交
  3. 17 7月, 2012 2 次提交
    • D
      dlm: use idr instead of list for recovered rsbs · 1d7c484e
      David Teigland 提交于
      When a large number of resources are being recovered,
      a linear search of the recover_list takes a long time.
      Use an idr in place of a list.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      1d7c484e
    • D
      dlm: use rsbtbl as resource directory · c04fecb4
      David Teigland 提交于
      Remove the dir hash table (dirtbl), and use
      the rsb hash table (rsbtbl) as the resource
      directory.  It has always been an unnecessary
      duplication of information.
      
      This improves efficiency by using a single rsbtbl
      lookup in many cases where both rsbtbl and dirtbl
      lookups were needed previously.
      
      This eliminates the need to handle cases of rsbtbl
      and dirtbl being out of sync.
      
      In many cases there will be memory savings because
      the dir hash table no longer exists.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      c04fecb4
  4. 03 5月, 2012 1 次提交
    • D
      dlm: fixes for nodir mode · 4875647a
      David Teigland 提交于
      The "nodir" mode (statically assign master nodes instead
      of using the resource directory) has always been highly
      experimental, and never seriously used.  This commit
      fixes a number of problems, making nodir much more usable.
      
      - Major change to recovery: recover all locks and restart
        all in-progress operations after recovery.  In some
        cases it's not possible to know which in-progess locks
        to recover, so recover all.  (Most require recovery
        in nodir mode anyway since rehashing changes most
        master nodes.)
      
      - Change the way nodir mode is enabled, from a command
        line mount arg passed through gfs2, into a sysfs
        file managed by dlm_controld, consistent with the
        other config settings.
      
      - Allow recovering MSTCPY locks on an rsb that has not
        yet been turned into a master copy.
      
      - Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
        from a previous, aborted recovery cycle.  Base this
        on the local recovery status not being in the state
        where any nodes should be sending LOCK messages for the
        current recovery cycle.
      
      - Hold rsb lock around dlm_purge_mstcpy_locks() because it
        may run concurrently with dlm_recover_master_copy().
      
      - Maintain highbast on process-copy lkb's (in addition to
        the master as is usual), because the lkb can switch
        back and forth between being a master and being a
        process copy as the master node changes in recovery.
      
      - When recovering MSTCPY locks, flag rsb's that have
        non-empty convert or waiting queues for granting
        at the end of recovery.  (Rename flag from LOCKS_PURGED
        to RECOVER_GRANT and similar for the recovery function,
        because it's not only resources with purged locks
        that need grant a grant attempt.)
      
      - Replace a couple of unnecessary assertion panics with
        error messages.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      4875647a
  5. 27 4月, 2012 1 次提交
  6. 04 1月, 2012 1 次提交
    • D
      dlm: add node slots and generation · 757a4271
      David Teigland 提交于
      Slot numbers are assigned to nodes when they join the lockspace.
      The slot number chosen is the minimum unused value starting at 1.
      Once a node is assigned a slot, that slot number will not change
      while the node remains a lockspace member.  If the node leaves
      and rejoins it can be assigned a new slot number.
      
      A new generation number is also added to a lockspace.  It is
      set and incremented during each recovery along with the slot
      collection/assignment.
      
      The slot numbers will be passed to gfs2 which will use them as
      journal id's.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      757a4271
  7. 11 3月, 2011 1 次提交
    • D
      dlm: record full callback state · 8304d6f2
      David Teigland 提交于
      Change how callbacks are recorded for locks.  Previously, information
      about multiple callbacks was combined into a couple of variables that
      indicated what the end result should be.  In some situations, we
      could not tell from this combined state what the exact sequence of
      callbacks were, and would end up either delivering the callbacks in
      the wrong order, or suppress redundant callbacks incorrectly.  This
      new approach records all the data for each callback, leaving no
      uncertainty about what needs to be delivered.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      8304d6f2
  8. 01 12月, 2009 1 次提交
    • D
      dlm: always use GFP_NOFS · 573c24c4
      David Teigland 提交于
      Replace all GFP_KERNEL and ls_allocation with GFP_NOFS.
      ls_allocation would be GFP_KERNEL for userland lockspaces
      and GFP_NOFS for file system lockspaces.
      
      It was discovered that any lockspaces on the system can
      affect all others by triggering memory reclaim in the
      file system which could in turn call back into the dlm
      to acquire locks, deadlocking dlm threads that were
      shared by all lockspaces, like dlm_recv.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      573c24c4
  9. 22 2月, 2008 1 次提交
  10. 06 2月, 2008 1 次提交
  11. 04 2月, 2008 5 次提交
  12. 31 1月, 2008 1 次提交
  13. 10 10月, 2007 1 次提交
    • D
      [DLM] block dlm_recv in recovery transition · c36258b5
      David Teigland 提交于
      Introduce a per-lockspace rwsem that's held in read mode by dlm_recv
      threads while working in the dlm.  This allows dlm_recv activity to be
      suspended when the lockspace transitions to, from and between recovery
      cycles.
      
      The specific bug prompting this change is one where an in-progress
      recovery cycle is aborted by a new recovery cycle.  While dlm_recv was
      processing a recovery message, the recovery cycle was aborted and
      dlm_recoverd began cleaning up.  dlm_recv decremented recover_locks_count
      on an rsb after dlm_recoverd had reset it to zero.  This is fixed by
      suspending dlm_recv (taking write lock on the rwsem) before aborting the
      current recovery.
      
      The transitions to/from normal and recovery modes are simplified by using
      this new ability to block dlm_recv.  The switch from normal to recovery
      mode means dlm_recv goes from processing locking messages, to saving them
      for later, and vice versa.  Races are avoided by blocking dlm_recv when
      setting the flag that switches between modes.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c36258b5
  14. 14 8月, 2007 1 次提交
  15. 09 7月, 2007 2 次提交
  16. 06 2月, 2007 4 次提交
    • D
      [DLM] rename dlm_config_info fields · 68c817a1
      David Teigland 提交于
      Add a "ci_" prefix to the fields in the dlm_config_info struct so that we
      can use macros to add configfs functions to access them (in a later
      patch).  No functional changes in this patch, just naming changes.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      68c817a1
    • D
      [DLM] change some log_error to log_debug · 8ec68867
      David Teigland 提交于
      Some common, non-error messages should use log_debug instead of log_error
      so they can be turned off.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8ec68867
    • D
      [DLM] add version check · 9e971b71
      David Teigland 提交于
      Check if we receive a message from another lockspace member running a
      version of the dlm with an incompatible inter-node message protocol.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9e971b71
    • D
      [DLM] fix old rcom messages · 38aa8b0c
      David Teigland 提交于
      A reply to a recovery message will often be received after the relevant
      recovery sequence has aborted and the next recovery sequence has begun.
      We need to ignore replies to these old messages from the previous
      recovery.  There's already a way to do this for synchronous recovery
      requests using the rc_id number, but not for async.
      
      Each recovery sequence already has a locally unique sequence number
      associated with it.  This patch adds a field to the rcom (recovery
      message) structure where this recovery sequence number can be placed,
      rc_seq.  When a node sends a reply to a recovery request, it copies the
      rc_seq number it received into rc_seq_reply.  When the first node receives
      the reply to its recovery message, it will check whether rc_seq_reply
      matches the current recovery sequence number, ls_recover_seq, and if not
      then it ignores the old reply.
      
      An old, inadequate approach to filtering out old replies (checking if the
      current stage of recovery has moved back to the start) has been removed
      from two spots.
      
      The protocol version number is changed to reflect the different rcom
      structures.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      38aa8b0c
  17. 30 11月, 2006 4 次提交
    • R
      [DLM] fix format warnings in rcom.c and recoverd.c · 57adf7ee
      Ryusuke Konishi 提交于
      This fixes the following gcc warnings generated on
      the architectures where uint64_t != unsigned long long (e.g. ppc64).
      
      fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'uint64_t'
      fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'uint64_t'
      fs/dlm/recoverd.c:48: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
      fs/dlm/recoverd.c:202: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
      fs/dlm/recoverd.c:210: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
      Signed-off-by: NRyusuke Konishi <ryusuke@osrg.net>
      Signed-off-by: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      57adf7ee
    • D
      [DLM] don't accept replies to old recovery messages · 98f176fb
      David Teigland 提交于
      We often abort a recovery after sending a status request to a remote node.
      We want to ignore any potential status reply we get from the remote node.
      If we get one of these unwanted replies, we've often moved on to the next
      recovery message and incremented the message sequence counter, so the
      reply will be ignored due to the seq number.  In some cases, we've not
      moved on to the next message so the seq number of the reply we want to
      ignore is still correct, causing the reply to be accepted.  The next
      recovery message will then mistake this old reply as a new one.
      
      To fix this, we add the flag RCOM_WAIT to indicate when we can accept a
      new reply.  We clear this flag if we abort recovery while waiting for a
      reply.  Before the flag is set again (to allow new replies) we know that
      any old replies will be rejected due to their sequence number.  We also
      initialize the recovery-message sequence number to a random value when a
      lockspace is first created.  This makes it clear when messages are being
      rejected from an old instance of a lockspace that has since been
      recreated.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      98f176fb
    • D
      [DLM] fix size of STATUS_REPLY message · 1babdb45
      David Teigland 提交于
      When the not_ready routine sends a "fake" status reply with blank status
      flags, it needs to use the correct size for a normal STATUS_REPLY by
      including the size of the would-be config parameters.  We also fill in the
      non-existant config parameters with an invalid lvblen value so it's easier
      to notice if these invalid paratmers are ever being used.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1babdb45
    • D
      [DLM] status messages ping-pong between unmounted nodes · 435618b7
      David Teigland 提交于
      Red Hat BZ 213682
      
      If two nodes leave the lockspace (while unmounting the fs in the case of
      gfs) after one has sent a STATUS message to the other, STATUS/STATUS_REPLY
      messages will then ping-pong between the nodes when neither of them can
      find the lockspace in question any longer.  We kill this by not sending
      another STATUS message when we get a STATUS_REPLY for an unknown
      lockspace.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      435618b7
  18. 24 8月, 2006 1 次提交
  19. 10 8月, 2006 1 次提交
  20. 09 8月, 2006 1 次提交
  21. 23 2月, 2006 1 次提交
  22. 18 1月, 2006 1 次提交