提交 · 2fa43ece12e25ffe4ac19e6259686c146068c580 · xindoo / redis

29 6月, 2018 1 次提交

Sentinel: add an option to deny online script reconfiguration. · 2fa43ece

由 antirez 提交于 6月 14, 2018

The ability of "SENTINEL SET" to change the reconfiguration script at
runtime is a problem even in the security model of Redis: any client
inside the network may set any executable to be ran once a failover is
triggered.

This option adds protection for this problem: by default the two
SENTINEL SET subcommands modifying scripts paths are denied. However the
user is still able to rever that using the Sentinel configuration file
in order to allow such a feature.

2fa43ece

23 5月, 2018 1 次提交

Sentinel: fix delay in detecting ODOWN. · 266e6423

由 antirez 提交于 5月 23, 2018

See issue #2819 for details. The gist is that when we want to send INFO
because we are over the time, we used to send only INFO commands, no
longer sending PING commands. However if a master fails exactly when we
are about to send an INFO command, the PING times will result zero
because the PONG reply was already received, and we'll fail to send more
PINGs, since we try only to send INFO commands: the failure detector
will delay until the connection is closed and re-opened for "long
timeout".

This commit changes the logic so that we can send the three kind of
messages regardless of the fact we sent another one already in the same
code path. It could happen that we go over the message limit for the
link by a few messages, but this is not significant. However now we'll
not introduce delays in sending commands just because there was
something else to send at the same time.

266e6423

22 2月, 2017 1 次提交

Use SipHash hash function to mitigate HashDos attempts. · ba647598

由 antirez 提交于 2月 20, 2017

This change attempts to switch to an hash function which mitigates
the effects of the HashDoS attack (denial of service attack trying
to force data structures to worst case behavior) while at the same time
providing Redis with an hash function that does not expect the input
data to be word aligned, a condition no longer true now that sds.c
strings have a varialbe length header.

Note that it is possible sometimes that even using an hash function
for which collisions cannot be generated without knowing the seed,
special implementation details or the exposure of the seed in an
indirect way (for example the ability to add elements to a Set and
check the return in which Redis returns them with SMEMBERS) may
make the attacker's life simpler in the process of trying to guess
the correct seed, however the next step would be to switch to a
log(N) data structure when too many items in a single bucket are
detected: this seems like an overkill in the case of Redis.

SPEED REGRESION TESTS:

In order to verify that switching from MurmurHash to SipHash had
no impact on speed, a set of benchmarks involving fast insertion
of 5 million of keys were performed.

The result shows Redis with SipHash in high pipelining conditions
to be about 4% slower compared to using the previous hash function.
However this could partially be related to the fact that the current
implementation does not attempt to hash whole words at a time but
reads single bytes, in order to have an output which is endian-netural
and at the same time working on systems where unaligned memory accesses
are a problem.

Further X86 specific optimizations should be tested, the function
may easily get at the same level of MurMurHash2 if a few optimizations
are performed.

ba647598

14 9月, 2016 1 次提交
- A
  
  Trim comment to 80 cols. · 041ab044
  由 antirez 提交于 9月 14, 2016
  
  041ab044
12 9月, 2016 1 次提交

Optimize repeated keyname hashing. · 68bf45fa

由 oranagra 提交于 5月 09, 2016

(Change cherry-picked and modified by @antirez from a larger commit
provided by @oranagra in PR #3223).

68bf45fa

22 7月, 2016 1 次提交

Sentinel: check Slave INFO state more often when disconnected. · 3e9ce38b

由 antirez 提交于 7月 22, 2016

During the initial handshake with the master a slave will report to have
a very high disconnection time from its master (since technically it was
disconnected since forever, so the current UNIX time in seconds is
reported).

However when the slave is connected again the Sentinel may re-scan the
INFO output again only after 10 seconds, which is a long time. During
this time Sentinels will consider this instance unable to failover, so
a useless delay is introduced.

Actaully this hardly happened in the practice because when a slave's
master is down, the INFO period for slaves changes to 1 second. However
when a manual failover is attempted immediately after adding slaves
(like in the case of the Sentinel unit test), this problem may happen.

This commit changes the INFO period to 1 second even in the case the
slave's master is not down, but the slave reported to be disconnected
from the master (by publishing, last time we checked, a master
disconnection time field in INFO).

This change is required as a result of an unrelated change in the
replication code that adds a small delay in the master-slave first
synchronization.

3e9ce38b

05 7月, 2016 1 次提交

Sentinel: fix cross-master Sentinel address update. · c383be3b

由 antirez 提交于 7月 04, 2016

This commit both fixes the crash reported with issue #3364 and
also properly closes the old links after the Sentinel address for the
other masters gets updated.

The two problems where:

1. The Sentinel that switched address may not monitor all the masters,
   it is possible that there is no match, and the 'match' variable is
   NULL. Now we check for no match and 'continue' to the next master.

2. By ispecting the code because of issue "1" I noticed that there was a
   problem in the code that disconnects the link of the Sentinel that
   needs the address update. Basically link->disconnected is non-zero
   even if just *a single link* (cc -- command link or pc -- pubsub
   link) are disconnected, so to check with if (link->disconnected)
   in order to close the links risks to leave one link connected.

I was able to manually reproduce the crash at "1" and verify that the
commit resolves the issue.

Close #3364.

c383be3b

17 6月, 2016 1 次提交

Fix Sentinel pending commands counting. · f7351f4c

由 antirez 提交于 6月 16, 2016

This bug most experienced effect was an inability of Redis to
reconfigure back old masters to slaves after they are reachable again
after a failover. This was due to failing to reset the count of the
pending commands properly, so the master appeared fovever down.

Was introduced in Redis 3.2 new Sentinel connection sharing feature
which is a lot more complex than the 3.0 code, but more scalable.

Many thanks to people reporting the issue, and especially to
@sskorgal for investigating the issue in depth.

Hopefully closes #3285.

f7351f4c

10 6月, 2016 2 次提交
- A
  
  fix comment "b>a" to "a > b" · 93a09877
  由 andyli 提交于 6月 07, 2016
  
  93a09877
- A
  
  Fixed typo in Sentinel compareSlavesForPromotion() comment. · 2a57ad5d
  由 antirez 提交于 6月 10, 2016
  
  2a57ad5d
26 5月, 2016 1 次提交
- M
  
  fix check when can't send the command to the promoted slave · aa578446
  由 MOON_CLJ 提交于 5月 26, 2016
  
  aa578446
27 1月, 2016 1 次提交

Sentinel: improve handling of known Sentinel instances. · 751b5666

由 antirez 提交于 1月 27, 2016

1. Bug #3035 is fixed (NULL pointer access). This was happening with the
   folling set of conditions:

* For some reason one of the Sentinels, let's call it Sentinel_A, changed ID (reconfigured from scratch), but is as the same address at which it used to be.

* Sentinel_A performs a failover and/or has a newer configuration compared to another Sentinel, that we call, Sentinel_B.

* Sentinel_B receives an HELLO message from Sentinel_A, where the address and/or ID is mismatched, but it is reporting a newer configuration for the master they are both monitoring.

2. Sentinels now must have an ID otherwise they are not loaded nor persisted in the configuration. This allows to have conflicting Sentinels with the same address since now the master->sentinels dictionary is indexed by Sentinel ID.

3. The code now detects if a Sentinel is annoucing itself with an IP/port pair already busy (of another Sentinel). The old Sentinel that had the same port/pair is set as having port 0, that means, the address is invalid. We may discover the right address later via HELLO messages.

751b5666

12 1月, 2016 1 次提交
- D
  Fix a possible race condition of sdown detection if the · e6d97053
  由 Daniel Shih 提交于 1月 12, 2016
```
connection to master/slave/sentinel decames disconnected just after the last PONG and before the next PING.
```
  e6d97053
08 9月, 2015 1 次提交
- A
  
  Sentinel: command arity check added where missing. · 33769f84
  由 antirez 提交于 9月 08, 2015
  
  33769f84
29 7月, 2015 1 次提交
- A
  Sentinel: add more commonly useful sections to INFO. · 6233d210
  由 antirez 提交于 7月 29, 2015
```
Debugging is hard without those when there are problems like the one
investigated in issue #2700.
```
  6233d210
27 7月, 2015 2 次提交
- A
  
  RDMF: More consistent define names. · 32f80e2f
  由 antirez 提交于 7月 27, 2015
  
  32f80e2f
- A
  
  RDMF: REDIS_OK REDIS_ERR -> C_OK C_ERR. · 40eb548a
  由 antirez 提交于 7月 26, 2015
  
  40eb548a
26 7月, 2015 4 次提交
- A
  
  RDMF: redisAssert -> serverAssert. · 2d9e3eb1
  由 antirez 提交于 7月 26, 2015
  
  2d9e3eb1
- A
  
  RDMF: use client instead of redisClient, like Disque. · 554bd0e7
  由 antirez 提交于 7月 26, 2015
  
  554bd0e7
- A
  
  RDMF: redisLog -> serverLog. · 424fe9af
  由 antirez 提交于 7月 26, 2015
  
  424fe9af
- A
  
  RDMF (Redis/Disque merge friendlyness) refactoring WIP 1. · cef054e8
  由 antirez 提交于 7月 26, 2015
  
  cef054e8
24 7月, 2015 1 次提交
- R
  
  Check args before run ckquorum. Fix issue #2635 · ef29748d
  由 Rogerio Goncalves 提交于 7月 24, 2015
  
  ef29748d
13 6月, 2015 1 次提交

Sentinel: fix bug in config rewriting during failover · 821a9866

由 antirez 提交于 6月 12, 2015

We have a check to rewrite the config properly when a failover is in
progress, in order to add the current (already failed over) master as
slave, and don't include in the slave list the promoted slave itself.

However there was an issue, the variable with the right address was
computed but never used when the code was modified, and no tests are
available for this feature for two reasons:

1. The Sentinel unit test currently does not test Sentinel ability to
persist its state at all.
2. It is a very hard to trigger state since it lasts for little time in
the context of the testing framework.

However this feature should be covered in the test in some way.

The bug was found by @badboy using the clang static analyzer.

Effects of the bug on safety of Sentinel
===

This bug results in severe issues in the following case:

1. A Sentinel is elected leader.
2. During the failover, it persists a wrong config with a known-slave
entry listing the master address.
3. The Sentinel crashes and restarts, reading invalid configuration from
disk.
4. It sees that the slave now does not obey the logical configuration
(should replicate from the current master), so it sends a SLAVEOF
command to the master (since the slave master is the same) creating a
replication loop (attempt to replicate from itself) which Redis is
currently unable to detect.
5. This means that the master is no longer available because of the bug.

However the lack of availability should be only transient (at least
in my tests, but other states could be possible where the problem
is not recovered automatically) because:

6. Sentinels treat masters reporting to be slaves as failing.
7. A new failover is triggered, and a slave is promoted to master.

Bug lifetime
===

The bug is there forever. Commit 16237d78 actually tried to fix the bug
but in the wrong way (the computed variable was never used! My fault).
So this bug is there basically since the start of Sentinel.

Since the bug is hard to trigger, I remember little reports matching
this condition, but I remember at least a few. Also in automated tests
where instances were stopped and restarted multiple times automatically
I remember hitting this issue, however I was not able to reproduce nor
to determine with the information I had at the time what was causing the
issue.

821a9866

25 5月, 2015 2 次提交
- A
  
  Sentinel: clarify effect of resetting failover_start_time. · 20700fe5
  由 antirez 提交于 5月 25, 2015
  
  20700fe5
- A
  
  Sentinel: help subcommand in simulate-failure command · 5080f2d6
  由 antirez 提交于 5月 25, 2015
  
  5080f2d6
22 5月, 2015 1 次提交

Sentinel: initial failure simulator implemented · fb3af75f

由 antirez 提交于 5月 22, 2015

This commit adds the SENTINEL simulate-failure, that sets specific
hooks inside the state machine that will crash Sentinel, for testing
purposes.

fb3af75f

20 5月, 2015 1 次提交
- A
  Sentinel: fix sentinelTryConnectionSharing() by checking for no match · c54de703
  由 antirez 提交于 5月 20, 2015
```
Trivial omission of the obvious no-match case.
```
  c54de703
18 5月, 2015 1 次提交

Sentinel: SENTINEL CKQUORUM command · abc65e89

由 antirez 提交于 5月 18, 2015

A way for monitoring systems to check that Sentinel is technically able
to reach the quorum and failover, using the currently visible Sentinels.

abc65e89

15 5月, 2015 1 次提交
- A
  
  Sentinel: port address update code to shared links logic · b43431ac
  由 antirez 提交于 5月 15, 2015
  
  b43431ac
14 5月, 2015 7 次提交

A

Sentinel: config-rewrite unique ID just one time · 4dee18cb
由 antirez 提交于 5月 14, 2015

4dee18cb
A

Sentinel: remove debugging message from releaseInstanceLink() · f9e942d4
由 antirez 提交于 5月 14, 2015

f9e942d4
A

Sentinel: fix access to NULL link->cc in releaseInstanceLink() · b44c3748
由 antirez 提交于 5月 14, 2015

b44c3748
A

Sentinel: remove SHARED! debugging printf · 87b6013a
由 antirez 提交于 5月 14, 2015

87b6013a
A
Sentinel: rewrite callback chain removing instances with shared links · 5a0516b5
由 antirez 提交于 5月 14, 2015
```
Otherwise pending commands callbacks will fire with a reference that no
longer exists.
```
5a0516b5
A

Sentinel: debugging code removed from sentinelSendPing() · 05dbc820
由 antirez 提交于 5月 14, 2015

05dbc820

Sentinel: use active/last time for ping logic · 58d2bb95

由 antirez 提交于 5月 14, 2015

The PING trigger was improved again by using two fields instead of a
single one to remember when the last ping was sent:

1. The "active" ping is the time at which we sent the last ping that
still received no reply. However we continue to ping non replying
instances even if they have an old active ping: the link may be
disconnected and reconencted in the meantime so the older pings may get
lost even if it's a TCP socket.

2. The "last" ping is the time at which we really sent the last ping
on the wire, and this is used in order to throttle the amount of pings
we send during failures (when no pong is received).

All in all the failure detector effectiveness should be identical but we
avoid to flood instances with pings during failures or when they are
slow.

58d2bb95

13 5月, 2015 1 次提交
- A
  
  Sentinel: limit reconnection frequency to the ping period · 3ab49895
  由 antirez 提交于 5月 13, 2015
  
  3ab49895
12 5月, 2015 3 次提交

Sentinel: PING trigger improved · 0eb0b55f

由 antirez 提交于 5月 12, 2015

It's ok to ping as soon as the ping period has elapsed since we received
the last PONG, but it's not good that we ping again if there is a
pending ping... With this change we'll send a new ping if there is one
pending only if two times the ping period elapsed since the ping which
is still pending was sent.

0eb0b55f

A

Sentinel: same-Sentinel link sharing across masters · 9d5e2ed3
由 antirez 提交于 5月 12, 2015

9d5e2ed3

Sentinel: add sentinelGetInstanceTypeString() fuction · e0a5246f

由 antirez 提交于 5月 12, 2015

This is useful for debugging and logging activities: given a
sentinelRedisInstance object returns a C string representing the
instance type: master, slave, sentinel.

e0a5246f