1. 21 3月, 2014 11 次提交
    • A
      Specify lruclock in redisServer structure via REDIS_LRU_BITS. · c68189a1
      antirez 提交于
      The padding field was totally useless: removed.
      c68189a1
    • A
      Set LRU parameters via REDIS_LRU_BITS define. · ff8c8187
      antirez 提交于
      ff8c8187
    • A
      Unify stats reset for CONFIG RESETSTAT / initServer(). · e3b71a1c
      antirez 提交于
      Now CONFIG RESETSTAT makes sure to reset all the fields, and in the
      future it will be simpler to avoid missing new fields.
      e3b71a1c
    • A
      Sentinel: sentinelRefreshInstanceInfo() minor refactoring. · 0937377a
      antirez 提交于
      Test sentinel.tilt condition on top and return if it is true.
      This allows to remove the check for the tilt condition in the remaining
      code paths of the function.
      0937377a
    • A
      Sentinel test: 02 unit better coverage + refactoring. · 686839b4
      antirez 提交于
      686839b4
    • A
      Sentinel test: foreach_instance_id implements 'break'. · 6d0e408a
      antirez 提交于
      6d0e408a
    • A
      Sentinel: instance_is_killed proc added to sentinel.tcl. · ba2edc41
      antirez 提交于
      ba2edc41
    • A
      9c2063fb
    • A
      Sentinel: down-after-milliseconds is not master-specific. · ffa8f479
      antirez 提交于
      addReplySentinelRedisInstance() modified so that this field is displayed
      for all the kind of instances: Sentinels, Masters, Slaves.
      ffa8f479
    • A
      Sentinel failure detection implementation improved. · 42091a79
      antirez 提交于
      Failure detection in Sentinel is ping-pong based. It used to work by
      remembering the last time a valid PONG reply was received, and checking
      if the reception time was too old compared to the current current time.
      
      PINGs were sent at a fixed interval of 1 second.
      
      This works in a decent way, but does not scale well when we want to set
      very small values of "down-after-milliseconds" (this is the node
      timeout basically).
      
      This commit reiplements the failure detection making a number of
      changes. Some changes are inspired to Redis Cluster failure detection
      code:
      
      * A new last_ping_time field is added in representation of instances.
        If non zero, we have an active ping that was sent at the specified
        time. When a valid reply to ping is received, the field is zeroed
        again.
      * last_ping_time is not reset when we reconnect the link or send a new
        ping, so from our point of view it represents the time we started
        waiting for the instance to reply to our pings without receiving a
        reply.
      * last_ping_time is now used in order to check if the instance is
        timed out. This means that we can have a node timeout of 100
        milliseconds and yet the system will work well since the new check is
        not bound to the period used to send pings.
      * Pings are now sent every second, or often if the value of
        down-after-milliseconds is less than one second. With a lower limit of
        10 HZ ping frequency.
      * Link reconnection code was improved. This is used in order to try to
        reconnect the link when we are at 50% of the node timeout without a
        valid reply received yet. However the old code triggered unnecessary
        reconnections when the node timeout was very small. Now that should be
        ok.
      
      The new code passes the tests but more testing is needed and more unit
      tests stressing the failure detector, so currently this is merged only
      in the unstable branch.
      42091a79
    • A
      Sentinel: use CLIENT SETNAME when connecting to Redis. · 38241c4b
      antirez 提交于
      This makes debugging / monitoring of Sentinels simpler since you can
      identify sentinels in CLIENT LIST output of Redis instances.
      38241c4b
  2. 15 3月, 2014 2 次提交
    • M
      Fix segfault from accessing array out of bounds · 9de07558
      Matt Stancliff 提交于
      argc == 2; argv[2] == crash
      9de07558
    • A
      Sentinel: be safe under crash-recovery assumptions. · a31a0b43
      antirez 提交于
      Sentinel's main safety argument is that there are no two configurations
      for the same master with the same version (configuration epoch).
      
      For this to be true Sentinels require to be authorized by a majority.
      Additionally Sentinels require to do two important things:
      
      * Never vote again for the same epoch.
      * Never exchange an old vote for a fresh one.
      
      The first prerequisite, in a crash-recovery system model, requires to
      persist the master->leader_epoch on durable storage before to reply to
      messages. This was not the case.
      
      We also make sure to persist the current epoch in order to never reply
      to stale votes requests from other Sentinels, after a recovery.
      
      The configuration is persisted by making use of fsync(), this is
      considered in the context of this code a good enough guarantee that
      after a restart our durable state is restored, however this may not
      always be the case depending on the kind of hardware and operating
      system used.
      a31a0b43
  3. 14 3月, 2014 2 次提交
    • A
      Sentinel: fake PUBLISH command to receive HELLO messages. · 6b0e36ff
      antirez 提交于
      Now the way HELLO messages are received is unified.
      Now it is no longer needed for Sentinels to converge to the higher
      configuration for a master to be able to chat via some Redis instance,
      the are able to directly exchanges configurations.
      
      Note that this commit does not include the (trivial) change needed to
      send HELLO messages to Sentinel instances as well, since for an error I
      committed the change in the previous commit that refactored hello
      messages processing into a separated function.
      6b0e36ff
    • A
  4. 13 3月, 2014 1 次提交
  5. 11 3月, 2014 6 次提交
  6. 10 3月, 2014 2 次提交
  7. 05 3月, 2014 16 次提交