提交 · c180bc7d98061fd59be54ca83b67abfd6ce65414 · Turbo码先生 / redis

28 4月, 2017 1 次提交

Regression test for PSYNC2 issue #3899 added. · c180bc7d

由 antirez 提交于 4月 28, 2017

Experimentally verified that it can trigger the issue reverting the fix.
At least on my system... Being the bug time/backlog dependant, it is
very hard to tell if this test will be able to trigger the problem
consistently, however even if it triggers the problem once in a while,
we'll see it in the CI environment at http://ci.redis.io.

c180bc7d

27 4月, 2017 1 次提交

PSYNC2: fix master cleanup when caching it. · 469d6e2b

由 antirez 提交于 4月 27, 2017

The master client cleanup was incomplete: resetClient() was missing and
the output buffer of the client was not reset, so pending commands
related to the previous connection could be still sent.

The first problem caused the client argument vector to be, at times,
half populated, so that when the correct replication stream arrived the
protcol got mixed to the arugments creating invalid commands that nobody
called.

Thanks to @yangsiran for also investigating this problem, after
already providing important design / implementation hints for the
original PSYNC2 issues (see referenced Github issue).

Note that this commit adds a new function to the list library of Redis
in order to be able to reset a list without destroying it.

Related to issue #3899.

469d6e2b

22 4月, 2017 4 次提交

A
Defrag: test currently disabled, too many false positives. · c861e1e1
由 antirez 提交于 4月 22, 2017
```
Related to #3786.
```
c861e1e1

Defrag: fix test false positive. · a1739085

由 antirez 提交于 4月 22, 2017

Apparently 1.4 is too low compared to what you get in certain setups
(including mine). I raised it to 1.55 that hopefully is still enough to
test that the fragmentation went down from 1.7 but without incurring in
issues, however the test setup may be still fragile so certain times this
may lead to false positives again, it's hard to test for these things
in a determinsitic way.

Related to #3786.

a1739085

O

add test for active defrag · 0fb5c4eb
由 oranagra 提交于 1月 30, 2017

0fb5c4eb
A
Revert "Jemalloc updated to 4.4.0." · e3b8492e
由 antirez 提交于 4月 22, 2017
```
This reverts commit 36c1acc2.
```
e3b8492e

21 4月, 2017 1 次提交

Check event loop creation return value. Fix #3951. · 238cebdd

由 antirez 提交于 4月 21, 2017

Normally we never check for OOM conditions inside Redis since the
allocator will always return a pointer or abort the program on OOM
conditons. However we cannot have control on epool_create(), that may
fail for kernel OOM (according to the manual page) even if all the
parameters are correct, so the function aeCreateEventLoop() may indeed
return NULL and this condition must be checked.

238cebdd

20 4月, 2017 1 次提交
- S
  Merge pull request #3950 from kensou97/unstable · 3773c06d
  由 Salvatore Sanfilippo 提交于 4月 20, 2017
```
update block->free after some diff data are written to the child process
```
  3773c06d
19 4月, 2017 3 次提交

A
Fix getKeysUsingCommandTable() in cluster mode. · 7d9dd80d
由 antirez 提交于 4月 19, 2017
```
Close #3940.
```
7d9dd80d

PSYNC2: discard pending transactions from cached master. · 189a12af

由 antirez 提交于 4月 19, 2017

During the review of the fix for #3899, @yangsiran identified an
implementation bug: given that the offset is now relative to the applied
part of the replication log, when we cache a master, the successive
PSYNC2 request will be made in order to *include* the transaction that
was not completely processed. This means that we need to discard any
pending transaction from our replication buffer: it will be re-executed.

189a12af

Fix PSYNC2 incomplete command bug as described in #3899. · 22be435e

由 antirez 提交于 4月 19, 2017

This bug was discovered by @kevinmcgehee and constituted a major hidden
bug in the PSYNC2 implementation, caused by the propagation from the
master of incomplete commands to slaves.

The bug had several results:

1. Borrowing from Kevin text in the issue: "Given that slaves blindly
copy over their master's input into their own replication backlog over
successive read syscalls, it's possible that with large commands or
small TCP buffers, partial commands are present in this buffer. If the
master were to fail before successfully propagating the entire command
to a slave, the slaves will never execute the partial command (since the
client is invalidated) but will copy it to replication backlog which may
relay those invalid bytes to its slaves on PSYNC2, corrupting the
backlog and possibly other valid commands that follow the failover.
Simple command boundaries aren't sufficient to capture this, either,
because in the case of a MULTI/EXEC block, if the master successfully
propagates a subset of the commands but not the EXEC, then the
transaction in the backlog becomes corrupt and could corrupt other
slaves that consume this data."

2. As identified by @yangsiran later, there is another effect of the
bug. For the same mechanism of the first problem, a slave having another
slave, could receive a full resynchronization request with an already
half-applied command in the backlog. Once the RDB is ready, it will be
sent to the slave, and the replication will continue sending to the
sub-slave the other half of the command, which is not valid.

The fix, designed by @yangsiran and @antirez, and implemented by
@antirez, uses a secondary buffer in order to feed the sub-masters and
update the replication backlog and offsets, only when a given part of
the query buffer is actually *applied* to the state of the instance,
that is, when the command gets processed and the command is not pending
in the Redis transaction buffer because of CLIENT_MULTI state.

Given that now the backlog and offsets representation are in agreement
with the actual processed commands, both issue 1 and 2 should no longer
be possible.

Thanks to @kevinmcgehee, @yangsiran and @oranagra for their work in
identifying and designing a fix for this problem.

22be435e

18 4月, 2017 8 次提交
- S
  Merge pull request #3945 from badboy/dicthash-bench-compile · 27fe8e9f
  由 Salvatore Sanfilippo 提交于 4月 18, 2017
```
Reorder to make dict-benchmark compile on Linux
```
  27fe8e9f
- A
  
  Fix #3848 by closing the descriptor on error. · 02d02a37
  由 antirez 提交于 4月 18, 2017
  
  02d02a37
- A
  
  Merge branch 'unstable' of github.com:/antirez/redis into unstable · 8b7b4d67
  由 antirez 提交于 4月 18, 2017
  
  8b7b4d67
- A
  
  Fix descriptor leak. Close #3848. · da2f9cd1
  由 antirez 提交于 4月 18, 2017
  
  da2f9cd1
- S
  Merge pull request #3856 from viennadd/issue-3847 · 332a05dc
  由 Salvatore Sanfilippo 提交于 4月 18, 2017
```
fix #3847: add close socket before return ANET_ERR.
```
  332a05dc
- 张
  
  update block->free after some diff data are written to the child process · 5f88bd32
  由张文康提交于 4月 18, 2017
  
  5f88bd32
- A
  Clarify why we save ziplist elements in revserse order. · c3349327
  由 antirez 提交于 4月 18, 2017
```
Also get rid of variables that are now kinda redundant, since the
dictionary iterator was removed.

This is related to PR #3949.
```
  c3349327
- S
  Merge pull request #3949 from spinlock/unstable-rdb-encoding · 0a942f17
  由 Salvatore Sanfilippo 提交于 4月 18, 2017
```
rdb: saving skiplist in reversed order to accelerate the deserialisation process
```
  0a942f17
17 4月, 2017 2 次提交
- J
  Reorder to make dict-benchmark compile on Linux · c4ad4765
  由 Jan-Erik Rediger 提交于 4月 17, 2017
```
Fixes #3944
```
  c4ad4765
- S
  
  rdb: saving skiplist in reversed order to accelerate the deserialisation process · 23ec3690
  由 spinlock 提交于 3月 31, 2017
  
  23ec3690
15 4月, 2017 1 次提交

Cluster: discard pong times in the future. · 271733f4

由 antirez 提交于 4月 15, 2017

However we allow for 500 milliseconds of tolerance, in order to
avoid often discarding semantically valid info (the node is up)
because of natural few milliseconds desync among servers even when
NTP is used.

Note that anyway we should ping the node from time to time regardless and
discover if it's actually down from our point of view, since no update
is accepted while we have an active ping on the node.

Related to #3929.

271733f4

14 4月, 2017 6 次提交

Test: fix, hopefully, false PSYNC failure like in issue #2715. · 3f068b92

由 antirez 提交于 4月 14, 2017

And many other related Github issues... all reporting the same problem.
There was probably just not enough backlog in certain unlucky runs.
I'll ask people that can reporduce if they see now this as fixed as
well.

3f068b92

Cluster: always add PFAIL nodes at end of gossip section. · 02777bb2

由 antirez 提交于 4月 14, 2017

To rely on the fact that nodes in PFAIL state will be shared around by
randomly adding them in the gossip section is a weak assumption,
especially after changes related to sending less ping/pong packets.

We want to always include gossip entries for all the nodes that are in
PFAIL state, so that the PFAIL -> FAIL state promotion can happen much
faster and reliably.

Related to #3929.

02777bb2

Cluster: fix gossip section ping/pong times encoding. · 8c829d9e

由 antirez 提交于 4月 14, 2017

The gossip section times are 32 bit, so cannot store the milliseconds
time but just the seconds approximation, which is good enough for our
uses. At the same time however, when comparing the gossip section times
of other nodes with our node's view, we need to convert back to
milliseconds.

Related to #3929. Without this change the patch to reduce the traffic in
the bus message does not work.

8c829d9e

A

Cluster: add clean-logs command to create-cluster script. · 6878a3fe
由 antirez 提交于 4月 14, 2017

6878a3fe

Cluster: decrease ping/pong traffic by trusting other nodes reports. · 8f7bf284

由 antirez 提交于 4月 14, 2017

Cluster of bigger sizes tend to have a lot of traffic in the cluster bus
just for failure detection: a node will try to get a ping reply from
another node no longer than when the half the node timeout would elapsed,
in order to avoid a false positive.

However this means that if we have N nodes and the node timeout is set
to, for instance M seconds, we'll have to ping N nodes every M/2
seconds. This N*M/2 pings will receive the same number of pongs, so
a total of N*M packets per node. However given that we have a total of N
nodes doing this, the total number of messages will be N*N*M.

In a 100 nodes cluster with a timeout of 60 seconds, this translates
to a total of 100*100*30 packets per second, summing all the packets
exchanged by all the nodes.

This is, as you can guess, a lot... So this patch changes the
implementation in a very simple way in order to trust the reports of
other nodes: if a node A reports a node B as alive at least up to
a given time, we update our view accordingly.

The problem with this approach is that it could result into a subset of
nodes being able to reach a given node X, and preventing others from
detecting that is actually not reachable from the majority of nodes.
So the above algorithm is refined by trusting other nodes only if we do
not have currently a ping pending for the node X, and if there are no
failure reports for that node.

Since each node, anyway, pings 10 other nodes every second (one node
every 100 milliseconds), anyway eventually even trusting the other nodes
reports, we will detect if a given node is down from our POV.

Now to understand the number of packets that the cluster would exchange
for failure detection with the patch, we can start considering the
random PINGs that the cluster sent anyway as base line:
Each node sends 10 packets per second, so the total traffic if no
additioal packets would be sent, including PONG packets, would be:

Total messages per second = N*10*2

However by trusting other nodes gossip sections will not AWALYS prevent
pinging nodes for the "half timeout reached" rule all the times. The
math involved in computing the actual rate as N and M change is quite
complex and depends also on another parameter, which is the number of
entries in the gossip section of PING and PONG packets. However it is
possible to compare what happens in cluster of different sizes
experimentally. After applying this patch a very important reduction in
the number of packets exchanged is trivial to observe, without apparent
impacts on the failure detection performances.

Actual numbers with different cluster sizes should be published in the
Reids Cluster documentation in the future.

Related to #3929.

8f7bf284

A
Cluster: collect more specific bus messages stats. · c5d6f577
由 antirez 提交于 4月 13, 2017
```
First step in order to change Cluster in order to use less messages.
Related to issue #3929.
```
c5d6f577

12 4月, 2017 2 次提交
- A
  
  Fix typo in feedReplicationBacklog() top comment. · 104584b9
  由 antirez 提交于 4月 12, 2017
  
  104584b9
- A
  
  Add a top comment in crucial functions inside networking.c. · 1210af38
  由 antirez 提交于 4月 12, 2017
  
  1210af38
11 4月, 2017 5 次提交
- A
  Set lua-time-limit default value at safe place. · 4a850be4
  由 antirez 提交于 4月 11, 2017
```
Otherwise, as it was, it will overwrite whatever the user set.

Close #3703.
```
  4a850be4
- A
  
  Fix preprocessor if/else chain broken in order to fix #3927. · f47607af
  由 antirez 提交于 4月 11, 2017
  
  f47607af
- A
  
  Merge branch 'unstable' of github.com:/antirez/redis into unstable · 74720ea9
  由 antirez 提交于 4月 11, 2017
  
  74720ea9
- A
  Fix zmalloc_get_memory_size() ifdefs to actually use the else branch. · aa5b4be0
  由 antirez 提交于 4月 11, 2017
```
Close #3927.
```
  aa5b4be0
- S
  Merge pull request #3924 from lorneli/unstable · 69ce5c5d
  由 Salvatore Sanfilippo 提交于 4月 11, 2017
```
Expire: Update comment of activeExpireCycle function
```
  69ce5c5d
10 4月, 2017 3 次提交

A

Make more obvious why there was issue #3843. · 531647bb
由 antirez 提交于 4月 10, 2017

531647bb
S
Merge pull request #3843 from dvirsky/fix_bc_free · 01b6966a
由 Salvatore Sanfilippo 提交于 4月 10, 2017
```
fixed free of blocked client before refering to it
```
01b6966a

Fix modules blocking commands awake delay. · ffefc9f9

由 antirez 提交于 4月 10, 2017

If a thread unblocks a client blocked in a module command, by using the
RedisMdoule_UnblockClient() API, the event loop may not be awaken until
the next timeout of the multiplexing API or the next unrelated I/O
operation on other clients. We actually want the client to be served
ASAP, so a mechanism is needed in order for the unblocking API to inform
Redis that there is a client to serve ASAP.

This commit fixes the issue using the old trick of the pipe: when a
client needs to be unblocked, a byte is written in a pipe. When we run
the list of clients blocked in modules, we consume all the bytes
written in the pipe. Writes and reads are performed inside the context
of the mutex, so no race is possible in which we consume the bytes that
are actually related to an awake request for a client that should still
be put into the list of clients to unblock.

It was verified that after the fix the server handles the blocked
clients with the expected short delay.

Thanks to @dvirsky for understanding there was such a problem and
reporting it.

ffefc9f9

08 4月, 2017 2 次提交
- A
  Rax library updated. · 91999fce
  由 antirez 提交于 4月 08, 2017
```
Important bugs fixed.
```
  91999fce
- L
  Expire: Update comment of activeExpireCycle function · 98db5739
  由 lorneli 提交于 4月 08, 2017
```
The macro REDIS_EXPIRELOOKUPS_TIME_PERC has been replaced by
ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC in commit
6500fabf.
```
  98db5739

Turbo码先生 / redis 与 Fork 源项目一致

Turbo码先生 / redis
与 Fork 源项目一致