提交 · dd36f18c76e7793567598fc168e54b41e06d4a61 · 别团等shy哥发育 / redis

01 9月, 2014 1 次提交
- M
  Sentinel: Abort Hello quicker if not connected · dd36f18c
  由 Matt Stancliff 提交于 8月 04, 2014
```
We can save a little work by aborting when we enter the function
if we're disconnected.
```
  dd36f18c
27 8月, 2014 3 次提交
- M
  Rename two 'buf' vars to 'ip' for better clarity · 4ac8472c
  由 Matt Stancliff 提交于 8月 06, 2014
```
Clearly ip[32] is wrong, but it's less clear that buf[32] was wrong
without further reading.
```
  4ac8472c
- E
  Sentinel: fix bufsize to support IPv6 address · f7b5e2d1
  由 Eiichi Sato 提交于 8月 06, 2014
```
Closes #1914
```
  f7b5e2d1
- A
  
  Remove warnings and improve integer sign correctness. · 65d47452
  由 antirez 提交于 8月 13, 2014
  
  65d47452
23 6月, 2014 2 次提交

A

Sentinel implementation of ROLE. · d3c29184
由 antirez 提交于 6月 23, 2014

d3c29184

Sentinel: bind source address · 80663349

由 Matt Stancliff 提交于 4月 24, 2014

Some deployments need traffic sent from a specific address.  This
change uses the same policy as Cluster where the first listed bindaddr
becomes the source address for outgoing Sentinel communication.

Fixes #1667

80663349

21 6月, 2014 1 次提交

Sentinel: send SLAVEOF with MULTI, CLIENT KILL, CONFIG REWRITE. · 7d0992da

由 antirez 提交于 6月 17, 2014

This implements the new Sentinel-Client protocol for the Sentinel part:
now instances are reconfigured using a transaction that ensures that the
config is rewritten in the target instance, and that clients lose the
connection with the instance, in order to be forced to: ask Sentinel,
reconnect to the instance, and verify the instance role with the new
ROLE command.

7d0992da

19 6月, 2014 2 次提交

Sentinel: send hello messages ASAP after config change. · 93ee0f26

由 antirez 提交于 6月 19, 2014

Eventual configuration convergence is guaranteed by our periodic hello
messages to all the instances, however when there are important notices
to share, better make a phone call. With this commit we force an hello
message to other Sentinal and Redis instances within the next 100
milliseconds of a config update, which is practically better than
waiting a few seconds.

93ee0f26

Sentinel: handle SRI_PROMOTED flag correctly. · 9b883974

由 antirez 提交于 6月 19, 2014

Lack of check of the SRI_PROMOTED flag caused Sentienl to act with the
promoted slave turned into a master during failover like if it was a
normal instance.

Normally this problem was not apparent because during real failovers the
old master is down so the bugged code path was not entered, however with
manual failovers via the SENTINEL FAILOVER command, the problem was
easily triggered.

This commit prevents promoted slaves from getting reconfigured, moreover
we now explicitly check that during a failover the slave turning into a
master is the one we selected for promotion and not a different one.

9b883974

28 5月, 2014 1 次提交
- A
  
  More trailing spaces in sentinel.c removed. · 1c0c42e6
  由 antirez 提交于 5月 28, 2014
  
  1c0c42e6
20 5月, 2014 1 次提交
- A
  
  Remove trailing spaces from sentinel.c. · 6d2fddd2
  由 antirez 提交于 5月 20, 2014
  
  6d2fddd2
08 5月, 2014 2 次提交

Sentinel: log when a failover will be attempted again. · 13d8b2b0

由 antirez 提交于 5月 08, 2014

When a Sentinel performs a failover (successful or not), or when a
Sentinel votes for a different Sentinel trying to start a failover, it
sets a min delay before it will try to get elected for a failover.

While not strictly needed, because if multiple Sentinels will try
to failover the same master at the same time, only one configuration
will eventually win, this serialization is practically very useful.
Normal failovers are cleaner: one Sentinel starts to failover, the
others update their config when the Sentinel performing the failover
is able to get the selected slave to move from the role of slave to the
one of master.

However currently this timeout was implicit, so users could see
Sentinels not reacting, after a failed failover, for some time, without
giving any feedback in the logs to the poor sysadmin waiting for clues.

This commit makes Sentinels more verbose about the delay: when a master
is down and a failover attempt is not performed because the delay has
still not elaped, something like that will be logged:

Next failover delay: I will not start a failover
before Thu May 8 16:48:59 2014

13d8b2b0

Sentinel: generate +config-update-from event when a new config is received. · 909d1883

由 antirez 提交于 5月 08, 2014

This event makes clear, before the switch-master event is generated,
that a Sentinel received a configuration update from another Sentinel.

909d1883

25 3月, 2014 4 次提交

Sentinel: remove variable causing warning · 3bd32406

由 Matt Stancliff 提交于 3月 18, 2014

GCC-4.9 warned about this, but clang didn't.

This commit fixes warning:
sentinel.c: In function 'sentinelReceiveHelloMessages':
sentinel.c:2156:43: warning: variable 'master' set but not used [-Wunused-but-set-variable]
     sentinelRedisInstance *ri = c->data, *master;

3bd32406

Fixed undefined variable value with certain code paths. · 79349aff

由 antirez 提交于 3月 24, 2014

In sentinelFlushConfig() fd could be undefined when the following if
statement was true:

        if (rewrite_status == -1) goto werr;

This could cause random file descriptors to get closed.

79349aff

M

Sentinel: Notify user when config can't be saved · 80dec5e4
由 Matt Stancliff 提交于 3月 14, 2014

80dec5e4
J

Small typo fixed · a2ec9a90
由 Jan-Erik Rediger 提交于 3月 05, 2014

a2ec9a90

21 3月, 2014 5 次提交

Sentinel: sentinelRefreshInstanceInfo() minor refactoring. · 0937377a

由 antirez 提交于 3月 18, 2014

Test sentinel.tilt condition on top and return if it is true.
This allows to remove the check for the tilt condition in the remaining
code paths of the function.

0937377a

A

Sentinel: propagate down-after-ms changes to slaves and sentinels. · 9c2063fb
由 antirez 提交于 3月 18, 2014

9c2063fb

Sentinel: down-after-milliseconds is not master-specific. · ffa8f479

由 antirez 提交于 3月 18, 2014

addReplySentinelRedisInstance() modified so that this field is displayed
for all the kind of instances: Sentinels, Masters, Slaves.

ffa8f479

Sentinel failure detection implementation improved. · 42091a79

由 antirez 提交于 3月 17, 2014

Failure detection in Sentinel is ping-pong based. It used to work by
remembering the last time a valid PONG reply was received, and checking
if the reception time was too old compared to the current current time.

PINGs were sent at a fixed interval of 1 second.

This works in a decent way, but does not scale well when we want to set
very small values of "down-after-milliseconds" (this is the node
timeout basically).

This commit reiplements the failure detection making a number of
changes. Some changes are inspired to Redis Cluster failure detection
code:

* A new last_ping_time field is added in representation of instances.
  If non zero, we have an active ping that was sent at the specified
  time. When a valid reply to ping is received, the field is zeroed
  again.
* last_ping_time is not reset when we reconnect the link or send a new
  ping, so from our point of view it represents the time we started
  waiting for the instance to reply to our pings without receiving a
  reply.
* last_ping_time is now used in order to check if the instance is
  timed out. This means that we can have a node timeout of 100
  milliseconds and yet the system will work well since the new check is
  not bound to the period used to send pings.
* Pings are now sent every second, or often if the value of
  down-after-milliseconds is less than one second. With a lower limit of
  10 HZ ping frequency.
* Link reconnection code was improved. This is used in order to try to
  reconnect the link when we are at 50% of the node timeout without a
  valid reply received yet. However the old code triggered unnecessary
  reconnections when the node timeout was very small. Now that should be
  ok.

The new code passes the tests but more testing is needed and more unit
tests stressing the failure detector, so currently this is merged only
in the unstable branch.

42091a79

Sentinel: use CLIENT SETNAME when connecting to Redis. · 38241c4b

由 antirez 提交于 3月 15, 2014

This makes debugging / monitoring of Sentinels simpler since you can
identify sentinels in CLIENT LIST output of Redis instances.

38241c4b

15 3月, 2014 2 次提交

M
Fix segfault from accessing array out of bounds · 9de07558
由 Matt Stancliff 提交于 3月 14, 2014
```
argc == 2; argv[2] == crash
```
9de07558

Sentinel: be safe under crash-recovery assumptions. · a31a0b43

由 antirez 提交于 3月 14, 2014

Sentinel's main safety argument is that there are no two configurations
for the same master with the same version (configuration epoch).

For this to be true Sentinels require to be authorized by a majority.
Additionally Sentinels require to do two important things:

* Never vote again for the same epoch.
* Never exchange an old vote for a fresh one.

The first prerequisite, in a crash-recovery system model, requires to
persist the master->leader_epoch on durable storage before to reply to
messages. This was not the case.

We also make sure to persist the current epoch in order to never reply
to stale votes requests from other Sentinels, after a recovery.

The configuration is persisted by making use of fsync(), this is
considered in the context of this code a good enough guarantee that
after a restart our durable state is restored, however this may not
always be the case depending on the kind of hardware and operating
system used.

a31a0b43

14 3月, 2014 2 次提交

Sentinel: fake PUBLISH command to receive HELLO messages. · 6b0e36ff

由 antirez 提交于 3月 14, 2014

Now the way HELLO messages are received is unified.
Now it is no longer needed for Sentinels to converge to the higher
configuration for a master to be able to chat via some Redis instance,
the are able to directly exchanges configurations.

Note that this commit does not include the (trivial) change needed to
send HELLO messages to Sentinel instances as well, since for an error I
committed the change in the previous commit that refactored hello
messages processing into a separated function.

6b0e36ff

A

Sentinel: HELLO processing refactored into sentinelProcessHelloMessage(). · bd48ff69
由 antirez 提交于 3月 14, 2014

bd48ff69

05 3月, 2014 1 次提交

Sentinel: more aggressive failover start desynchronization. · 1606978a

由 antirez 提交于 3月 04, 2014

Sentinel needs to avoid split brain conditions due to multiple sentinels
trying to get voted at the exact same time.

So far some desynchronization was provided by fluctuating server.hz,
that is the frequency of the timer function call. However the
desynchonization provided in this way was not enough when using many
Sentinel instances, especially when a large quorum value is used in
order to force a greater degree of agreement (more than N/2+1).

It was verified that it was likely to trigger a split brain
condition, forcing the system to try again after a timeout.
Usually the system will succeed after a few retries, but this is not
optimal.

This commit desynchronizes instances in a more effective way to make it
likely that the first attempt will be successful.

1606978a

25 2月, 2014 5 次提交

A

Sentinel: log quorum with +monitor event. · 85fa77e0
由 antirez 提交于 2月 24, 2014

85fa77e0
A

Sentinel: generate +monitor events at startup. · 6e610679
由 antirez 提交于 2月 24, 2014

6e610679

Sentinel: log +monitor and +set events. · 96162c0c

由 antirez 提交于 2月 24, 2014

Now that we have a runtime configuration system, it is very important to
be able to log how the Sentinel configuration changes over time because
of API calls.

96162c0c

A

Sentinel: added missing exit(1) after checking for config file. · 39eacde1
由 antirez 提交于 2月 24, 2014

39eacde1

Sentinel: IDONTKNOW error removed. · d83ab8f9

由 antirez 提交于 2月 22, 2014

This error was conceived for the older version of Sentinel that worked
via master redirection and that was not able to get configuration
updates from other Sentinels via the Pub/Sub channel of masters or
slaves.

This reply does not make sense today, every Sentinel should reply with
the best information it has currently. The error will make even more
sense in the future since the plan is to allow Sentinels to update the
configuration of other Sentinels via gossip with a direct chat without
the prerequisite that they have at least a monitored instance in common.

d83ab8f9

20 2月, 2014 1 次提交
- A
  Sentinel: report instances role switch events. · 6441a41f
  由 antirez 提交于 2月 20, 2014
```
This is useful mostly for debugging of issues.
```
  6441a41f
18 2月, 2014 2 次提交

A
Sentinel: SENTINEL_SLAVE_RECONF_RETRY_PERIOD -> RECONF_TIMEOUT · 905c55d5
由 antirez 提交于 2月 18, 2014
```
Rename define to match the new meaning.
```
905c55d5

Sentinel: fix slave promotion timeout. · 1b345ec3

由 antirez 提交于 2月 18, 2014

If we can't reconfigure a slave in time during failover, go forward as
anyway the slave will be fixed by Sentinels in the future, once they
detect it is misconfigured.

Otherwise a failover in progress may never terminate if for some reason
the slave is uncapable to sync with the master while at the same time
it is not disconnected.

1b345ec3

17 2月, 2014 1 次提交

Sentinel: better specify startup errors due to config file. · 5efee4f0

由 antirez 提交于 2月 17, 2014

Now it logs the file name if it is not accessible. Also there is a
different error for the missing config file case, and for the non
writable file case.

5efee4f0

07 2月, 2014 1 次提交
- A
  
  Sentinel: allow SHUTDOWN command in Sentinel mode. · 3e496833
  由 antirez 提交于 2月 07, 2014
  
  3e496833
03 2月, 2014 1 次提交

Move mstime_t define outside sentinel.c. · ddcf1603

由 antirez 提交于 2月 03, 2014

The define is now used in other parts of Redis 2.8 tree instead of long
long.

A nice side effect is that now 2.8 and unstable sentinel.c files are
identical as it should be.

ddcf1603

31 1月, 2014 1 次提交
- A
  Sentinel: check arity for SENTINEL MASTER command. · f0652c37
  由 antirez 提交于 1月 31, 2014
```
This fixes issue #1530.
```
  f0652c37
28 1月, 2014 1 次提交
- A
  
  SENTINEL SET master quorum implemented. · a2c9d38a
  由 antirez 提交于 1月 14, 2014
  
  a2c9d38a

别团等shy哥发育 / redis 与 Fork 源项目一致

别团等shy哥发育 / redis
与 Fork 源项目一致