1. 06 2月, 2007 2 次提交
    • D
      [DLM] add version check · 9e971b71
      David Teigland 提交于
      Check if we receive a message from another lockspace member running a
      version of the dlm with an incompatible inter-node message protocol.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9e971b71
    • D
      [DLM] fix old rcom messages · 38aa8b0c
      David Teigland 提交于
      A reply to a recovery message will often be received after the relevant
      recovery sequence has aborted and the next recovery sequence has begun.
      We need to ignore replies to these old messages from the previous
      recovery.  There's already a way to do this for synchronous recovery
      requests using the rc_id number, but not for async.
      
      Each recovery sequence already has a locally unique sequence number
      associated with it.  This patch adds a field to the rcom (recovery
      message) structure where this recovery sequence number can be placed,
      rc_seq.  When a node sends a reply to a recovery request, it copies the
      rc_seq number it received into rc_seq_reply.  When the first node receives
      the reply to its recovery message, it will check whether rc_seq_reply
      matches the current recovery sequence number, ls_recover_seq, and if not
      then it ignores the old reply.
      
      An old, inadequate approach to filtering out old replies (checking if the
      current stage of recovery has moved back to the start) has been removed
      from two spots.
      
      The protocol version number is changed to reflect the different rcom
      structures.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      38aa8b0c
  2. 30 11月, 2006 4 次提交
    • R
      [DLM] fix format warnings in rcom.c and recoverd.c · 57adf7ee
      Ryusuke Konishi 提交于
      This fixes the following gcc warnings generated on
      the architectures where uint64_t != unsigned long long (e.g. ppc64).
      
      fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'uint64_t'
      fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'uint64_t'
      fs/dlm/recoverd.c:48: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
      fs/dlm/recoverd.c:202: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
      fs/dlm/recoverd.c:210: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
      Signed-off-by: NRyusuke Konishi <ryusuke@osrg.net>
      Signed-off-by: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      57adf7ee
    • D
      [DLM] don't accept replies to old recovery messages · 98f176fb
      David Teigland 提交于
      We often abort a recovery after sending a status request to a remote node.
      We want to ignore any potential status reply we get from the remote node.
      If we get one of these unwanted replies, we've often moved on to the next
      recovery message and incremented the message sequence counter, so the
      reply will be ignored due to the seq number.  In some cases, we've not
      moved on to the next message so the seq number of the reply we want to
      ignore is still correct, causing the reply to be accepted.  The next
      recovery message will then mistake this old reply as a new one.
      
      To fix this, we add the flag RCOM_WAIT to indicate when we can accept a
      new reply.  We clear this flag if we abort recovery while waiting for a
      reply.  Before the flag is set again (to allow new replies) we know that
      any old replies will be rejected due to their sequence number.  We also
      initialize the recovery-message sequence number to a random value when a
      lockspace is first created.  This makes it clear when messages are being
      rejected from an old instance of a lockspace that has since been
      recreated.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      98f176fb
    • D
      [DLM] fix size of STATUS_REPLY message · 1babdb45
      David Teigland 提交于
      When the not_ready routine sends a "fake" status reply with blank status
      flags, it needs to use the correct size for a normal STATUS_REPLY by
      including the size of the would-be config parameters.  We also fill in the
      non-existant config parameters with an invalid lvblen value so it's easier
      to notice if these invalid paratmers are ever being used.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1babdb45
    • D
      [DLM] status messages ping-pong between unmounted nodes · 435618b7
      David Teigland 提交于
      Red Hat BZ 213682
      
      If two nodes leave the lockspace (while unmounting the fs in the case of
      gfs) after one has sent a STATUS message to the other, STATUS/STATUS_REPLY
      messages will then ping-pong between the nodes when neither of them can
      find the lockspace in question any longer.  We kill this by not sending
      another STATUS message when we get a STATUS_REPLY for an unknown
      lockspace.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      435618b7
  3. 24 8月, 2006 1 次提交
  4. 10 8月, 2006 1 次提交
  5. 09 8月, 2006 1 次提交
  6. 23 2月, 2006 1 次提交
  7. 18 1月, 2006 1 次提交