1. 29 3月, 2013 2 次提交
  2. 23 3月, 2013 7 次提交
  3. 22 1月, 2013 1 次提交
    • L
      drbd: fix potential protocol error and resulting disconnect/reconnect · 2681f7f6
      Lars Ellenberg 提交于
      When we notice a disk failure on the receiving side,
      we stop sending it new incoming writes.
      
      Depending on exact timing of various events, the same transfer log epoch
      could end up containing both replicated (before we noticed the failure)
      and local-only requests (after we noticed the failure).
      
      The sanity checks in tl_release(), called when receiving a
      P_BARRIER_ACK, check that the ack'ed transfer log epoch matches
      the expected epoch, and the number of contained writes matches
      the number of ack'ed writes.
      
      In this case, they counted both replicated and local-only writes,
      but the peer only acknowledges those it has seen.  We get a mismatch,
      resulting in a protocol error and disconnect/reconnect cycle.
      
      Messages logged are
        "BAD! BarrierAck #%u received with n_writes=%u, expected n_writes=%u!\n"
      
      A similar issue can also be triggered when starting a resync while
      having a healthy replication link, by invalidating one side, forcing a
      full sync, or attaching to a diskless node.
      
      Fix this by closing the current epoch if the state changes in a way
      that would cause the replication intent of the next write.
      
      Epochs now contain either only non-replicated,
      or only replicated writes.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      2681f7f6
  4. 09 11月, 2012 12 次提交
  5. 08 11月, 2012 18 次提交