1. 18 3月, 2014 6 次提交
    • A
      Sentinel test: 02 unit better coverage + refactoring. · 258d377d
      antirez 提交于
      258d377d
    • A
      Sentinel test: foreach_instance_id implements 'break'. · 58f104e2
      antirez 提交于
      58f104e2
    • A
      Sentinel: instance_is_killed proc added to sentinel.tcl. · 2586ea76
      antirez 提交于
      2586ea76
    • A
      218cc5fc
    • A
      Sentinel: down-after-milliseconds is not master-specific. · bb6d8501
      antirez 提交于
      addReplySentinelRedisInstance() modified so that this field is displayed
      for all the kind of instances: Sentinels, Masters, Slaves.
      bb6d8501
    • A
      Sentinel failure detection implementation improved. · ae0b7680
      antirez 提交于
      Failure detection in Sentinel is ping-pong based. It used to work by
      remembering the last time a valid PONG reply was received, and checking
      if the reception time was too old compared to the current current time.
      
      PINGs were sent at a fixed interval of 1 second.
      
      This works in a decent way, but does not scale well when we want to set
      very small values of "down-after-milliseconds" (this is the node
      timeout basically).
      
      This commit reiplements the failure detection making a number of
      changes. Some changes are inspired to Redis Cluster failure detection
      code:
      
      * A new last_ping_time field is added in representation of instances.
        If non zero, we have an active ping that was sent at the specified
        time. When a valid reply to ping is received, the field is zeroed
        again.
      * last_ping_time is not reset when we reconnect the link or send a new
        ping, so from our point of view it represents the time we started
        waiting for the instance to reply to our pings without receiving a
        reply.
      * last_ping_time is now used in order to check if the instance is
        timed out. This means that we can have a node timeout of 100
        milliseconds and yet the system will work well since the new check is
        not bound to the period used to send pings.
      * Pings are now sent every second, or often if the value of
        down-after-milliseconds is less than one second. With a lower limit of
        10 HZ ping frequency.
      * Link reconnection code was improved. This is used in order to try to
        reconnect the link when we are at 50% of the node timeout without a
        valid reply received yet. However the old code triggered unnecessary
        reconnections when the node timeout was very small. Now that should be
        ok.
      
      The new code passes the tests but more testing is needed and more unit
      tests stressing the failure detector, so currently this is merged only
      in the unstable branch.
      ae0b7680
  2. 15 3月, 2014 3 次提交
  3. 14 3月, 2014 3 次提交
    • A
      Sentinel: be safe under crash-recovery assumptions. · ed813863
      antirez 提交于
      Sentinel's main safety argument is that there are no two configurations
      for the same master with the same version (configuration epoch).
      
      For this to be true Sentinels require to be authorized by a majority.
      Additionally Sentinels require to do two important things:
      
      * Never vote again for the same epoch.
      * Never exchange an old vote for a fresh one.
      
      The first prerequisite, in a crash-recovery system model, requires to
      persist the master->leader_epoch on durable storage before to reply to
      messages. This was not the case.
      
      We also make sure to persist the current epoch in order to never reply
      to stale votes requests from other Sentinels, after a recovery.
      
      The configuration is persisted by making use of fsync(), this is
      considered in the context of this code a good enough guarantee that
      after a restart our durable state is restored, however this may not
      always be the case depending on the kind of hardware and operating
      system used.
      ed813863
    • A
      Sentinel: fake PUBLISH command to receive HELLO messages. · 36509402
      antirez 提交于
      Now the way HELLO messages are received is unified.
      Now it is no longer needed for Sentinels to converge to the higher
      configuration for a master to be able to chat via some Redis instance,
      the are able to directly exchanges configurations.
      
      Note that this commit does not include the (trivial) change needed to
      send HELLO messages to Sentinel instances as well, since for an error I
      committed the change in the previous commit that refactored hello
      messages processing into a separated function.
      36509402
    • A
  4. 13 3月, 2014 2 次提交
  5. 11 3月, 2014 9 次提交
  6. 10 3月, 2014 17 次提交