1. 31 7月, 2012 1 次提交
    • A
      Sentinel: abort failover when in wait-start if master is back. · 75084e05
      antirez 提交于
      When we are a Leader Sentinel in wait-start state, starting with this
      commit the failover is aborted if the master returns online.
      
      This improves the way we handle a notable case of net split, that is the
      split between Sentinels and Redis servers, that will be a very common
      case of split becase Sentinels will often be installed in the client's
      network and servers can be in a differnt arm of the network.
      
      When Sentinels and Redis servers are isolated the master is in ODOWN
      condition since the Sentinels can agree about this state, however the
      failover does not start since there are no good slaves to promote (in
      this specific case all the slaves are unreachable).
      
      However when the split is resolved, Sentinels may sense the slave back
      a moment before they sense the master is back, so the failover may start
      without a good reason (since the master is actually working too).
      
      Now this condition is reversible, so the failover will be aborted
      immediately after if the master is detected to be working again, that
      is, not in SDOWN nor in ODOWN condition.
      75084e05
  2. 29 7月, 2012 1 次提交
    • A
      Sentinel: scripts execution engine improved. · 3f194a9d
      antirez 提交于
      We no longer use a vanilla fork+execve but take a queue of jobs of
      scripts to execute, with retry on error, timeouts, and so forth.
      
      Currently this is used only for notifications but soon the ability to
      also call clients reconfiguration scripts will be added.
      3f194a9d
  3. 28 7月, 2012 1 次提交
  4. 26 7月, 2012 1 次提交
  5. 25 7月, 2012 6 次提交
    • A
      Sentinel: ability to execute notification scripts. · baace5fc
      antirez 提交于
      baace5fc
    • A
      Sentinel: abort failover if no good slave is available. · 672102c2
      antirez 提交于
      The previous behavior of the state machine was to wait some time and
      retry the slave selection, but this is not robust enough against drastic
      changes in the conditions of the monitored instances.
      
      What we do now when the slave selection fails is to abort the failover
      and return back monitoring the master. If the ODOWN condition is still
      present a new failover will be triggered and so forth.
      
      This commit also refactors the code we use to abort a failover.
      672102c2
    • A
      Sentinel: reset pending_commands in a more generic way. · 9e5bef38
      antirez 提交于
      9e5bef38
    • A
      Prevent a spurious +sdown event on switch. · a23a5b6c
      antirez 提交于
      When we reset the master we should start with clean timestamps for ping
      replies otherwise we'll detect a spurious +sdown event, because on
      +master-switch event the previous master instance was probably in +sdown
      condition. Since we updated the address we should count time from
      scratch again.
      
      Also this commit makes sure to explicitly reset the count of pending
      commands, now we can do this because of the new way the hiredis link
      is closed.
      a23a5b6c
    • A
      Sentinel: debugging message removed. · d918e6f1
      antirez 提交于
      d918e6f1
    • A
      Sentinel: changes to connection handling and redirection. · 75fb6e5b
      antirez 提交于
      We disconnect the Redis instances hiredis link in a more robust way now.
      Also we change the way we perform the redirection for the +switch-master
      event, that is not just an instance reset with an address change.
      
      Using the same system we now implement the +redirect-to-master event
      that is triggered by an instance that is configured to be master but
      found to be a slave at the first INFO reply. In that case we monitor the
      master instead, logging the incident as an event.
      75fb6e5b
  6. 24 7月, 2012 2 次提交
    • A
      Sentinel: check that instance still exists in reply callbacks. · 2179c269
      antirez 提交于
      We can't be sure the instance object still exists when the reply
      callback is called.
      2179c269
    • A
      Sentinel: more robust failover detection as observer. · d876d6fe
      antirez 提交于
      Sentinel observers detect failover checking if a slave attached to the
      monitored master turns into its replication state from slave to master.
      However while this change may in theory only happen after a SLAVEOF NO
      ONE command, in practie it is very easy to reboot a slave instance with
      a wrong configuration that turns it into a master, especially if it was
      a past master before a successfull failover.
      
      This commit changes the detection policy so that if an instance goes
      from slave to master, but at the same time the runid has changed, we
      sense a reboot, and in that case we don't detect a failover at all.
      
      This commit also introduces the "reboot" sentinel event, that is logged
      at "warning" level (so this will trigger an admin notification).
      
      The commit also fixes a problem in the disconnect handler that assumed
      that the instance object always existed, that is not the case. Now we
      no longer assume that redisAsyncFree() will call the disconnection
      handler before returning.
      d876d6fe
  7. 23 7月, 2012 1 次提交