1. 04 2月, 2008 4 次提交
  2. 31 1月, 2008 1 次提交
  3. 10 10月, 2007 1 次提交
    • D
      [DLM] block dlm_recv in recovery transition · c36258b5
      David Teigland 提交于
      Introduce a per-lockspace rwsem that's held in read mode by dlm_recv
      threads while working in the dlm.  This allows dlm_recv activity to be
      suspended when the lockspace transitions to, from and between recovery
      cycles.
      
      The specific bug prompting this change is one where an in-progress
      recovery cycle is aborted by a new recovery cycle.  While dlm_recv was
      processing a recovery message, the recovery cycle was aborted and
      dlm_recoverd began cleaning up.  dlm_recv decremented recover_locks_count
      on an rsb after dlm_recoverd had reset it to zero.  This is fixed by
      suspending dlm_recv (taking write lock on the rwsem) before aborting the
      current recovery.
      
      The transitions to/from normal and recovery modes are simplified by using
      this new ability to block dlm_recv.  The switch from normal to recovery
      mode means dlm_recv goes from processing locking messages, to saving them
      for later, and vice versa.  Races are avoided by blocking dlm_recv when
      setting the flag that switches between modes.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c36258b5
  4. 14 8月, 2007 1 次提交
  5. 09 7月, 2007 2 次提交
  6. 06 2月, 2007 4 次提交
    • D
      [DLM] rename dlm_config_info fields · 68c817a1
      David Teigland 提交于
      Add a "ci_" prefix to the fields in the dlm_config_info struct so that we
      can use macros to add configfs functions to access them (in a later
      patch).  No functional changes in this patch, just naming changes.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      68c817a1
    • D
      [DLM] change some log_error to log_debug · 8ec68867
      David Teigland 提交于
      Some common, non-error messages should use log_debug instead of log_error
      so they can be turned off.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8ec68867
    • D
      [DLM] add version check · 9e971b71
      David Teigland 提交于
      Check if we receive a message from another lockspace member running a
      version of the dlm with an incompatible inter-node message protocol.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9e971b71
    • D
      [DLM] fix old rcom messages · 38aa8b0c
      David Teigland 提交于
      A reply to a recovery message will often be received after the relevant
      recovery sequence has aborted and the next recovery sequence has begun.
      We need to ignore replies to these old messages from the previous
      recovery.  There's already a way to do this for synchronous recovery
      requests using the rc_id number, but not for async.
      
      Each recovery sequence already has a locally unique sequence number
      associated with it.  This patch adds a field to the rcom (recovery
      message) structure where this recovery sequence number can be placed,
      rc_seq.  When a node sends a reply to a recovery request, it copies the
      rc_seq number it received into rc_seq_reply.  When the first node receives
      the reply to its recovery message, it will check whether rc_seq_reply
      matches the current recovery sequence number, ls_recover_seq, and if not
      then it ignores the old reply.
      
      An old, inadequate approach to filtering out old replies (checking if the
      current stage of recovery has moved back to the start) has been removed
      from two spots.
      
      The protocol version number is changed to reflect the different rcom
      structures.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      38aa8b0c
  7. 30 11月, 2006 4 次提交
    • R
      [DLM] fix format warnings in rcom.c and recoverd.c · 57adf7ee
      Ryusuke Konishi 提交于
      This fixes the following gcc warnings generated on
      the architectures where uint64_t != unsigned long long (e.g. ppc64).
      
      fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'uint64_t'
      fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'uint64_t'
      fs/dlm/recoverd.c:48: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
      fs/dlm/recoverd.c:202: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
      fs/dlm/recoverd.c:210: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
      Signed-off-by: NRyusuke Konishi <ryusuke@osrg.net>
      Signed-off-by: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      57adf7ee
    • D
      [DLM] don't accept replies to old recovery messages · 98f176fb
      David Teigland 提交于
      We often abort a recovery after sending a status request to a remote node.
      We want to ignore any potential status reply we get from the remote node.
      If we get one of these unwanted replies, we've often moved on to the next
      recovery message and incremented the message sequence counter, so the
      reply will be ignored due to the seq number.  In some cases, we've not
      moved on to the next message so the seq number of the reply we want to
      ignore is still correct, causing the reply to be accepted.  The next
      recovery message will then mistake this old reply as a new one.
      
      To fix this, we add the flag RCOM_WAIT to indicate when we can accept a
      new reply.  We clear this flag if we abort recovery while waiting for a
      reply.  Before the flag is set again (to allow new replies) we know that
      any old replies will be rejected due to their sequence number.  We also
      initialize the recovery-message sequence number to a random value when a
      lockspace is first created.  This makes it clear when messages are being
      rejected from an old instance of a lockspace that has since been
      recreated.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      98f176fb
    • D
      [DLM] fix size of STATUS_REPLY message · 1babdb45
      David Teigland 提交于
      When the not_ready routine sends a "fake" status reply with blank status
      flags, it needs to use the correct size for a normal STATUS_REPLY by
      including the size of the would-be config parameters.  We also fill in the
      non-existant config parameters with an invalid lvblen value so it's easier
      to notice if these invalid paratmers are ever being used.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1babdb45
    • D
      [DLM] status messages ping-pong between unmounted nodes · 435618b7
      David Teigland 提交于
      Red Hat BZ 213682
      
      If two nodes leave the lockspace (while unmounting the fs in the case of
      gfs) after one has sent a STATUS message to the other, STATUS/STATUS_REPLY
      messages will then ping-pong between the nodes when neither of them can
      find the lockspace in question any longer.  We kill this by not sending
      another STATUS message when we get a STATUS_REPLY for an unknown
      lockspace.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      435618b7
  8. 24 8月, 2006 1 次提交
  9. 10 8月, 2006 1 次提交
  10. 09 8月, 2006 1 次提交
  11. 23 2月, 2006 1 次提交
  12. 18 1月, 2006 1 次提交