1. 10 5月, 2017 1 次提交
  2. 09 5月, 2017 3 次提交
  3. 04 5月, 2017 3 次提交
  4. 03 5月, 2017 7 次提交
  5. 02 5月, 2017 3 次提交
  6. 29 4月, 2017 1 次提交
  7. 28 4月, 2017 1 次提交
    • A
      Regression test for PSYNC2 issue #3899 added. · c180bc7d
      antirez 提交于
      Experimentally verified that it can trigger the issue reverting the fix.
      At least on my system... Being the bug time/backlog dependant, it is
      very hard to tell if this test will be able to trigger the problem
      consistently, however even if it triggers the problem once in a while,
      we'll see it in the CI environment at http://ci.redis.io.
      c180bc7d
  8. 27 4月, 2017 1 次提交
    • A
      PSYNC2: fix master cleanup when caching it. · 469d6e2b
      antirez 提交于
      The master client cleanup was incomplete: resetClient() was missing and
      the output buffer of the client was not reset, so pending commands
      related to the previous connection could be still sent.
      
      The first problem caused the client argument vector to be, at times,
      half populated, so that when the correct replication stream arrived the
      protcol got mixed to the arugments creating invalid commands that nobody
      called.
      
      Thanks to @yangsiran for also investigating this problem, after
      already providing important design / implementation hints for the
      original PSYNC2 issues (see referenced Github issue).
      
      Note that this commit adds a new function to the list library of Redis
      in order to be able to reset a list without destroying it.
      
      Related to issue #3899.
      469d6e2b
  9. 22 4月, 2017 4 次提交
  10. 21 4月, 2017 1 次提交
    • A
      Check event loop creation return value. Fix #3951. · 238cebdd
      antirez 提交于
      Normally we never check for OOM conditions inside Redis since the
      allocator will always return a pointer or abort the program on OOM
      conditons. However we cannot have control on epool_create(), that may
      fail for kernel OOM (according to the manual page) even if all the
      parameters are correct, so the function aeCreateEventLoop() may indeed
      return NULL and this condition must be checked.
      238cebdd
  11. 20 4月, 2017 1 次提交
  12. 19 4月, 2017 3 次提交
    • A
      Fix getKeysUsingCommandTable() in cluster mode. · 7d9dd80d
      antirez 提交于
      Close #3940.
      7d9dd80d
    • A
      PSYNC2: discard pending transactions from cached master. · 189a12af
      antirez 提交于
      During the review of the fix for #3899, @yangsiran identified an
      implementation bug: given that the offset is now relative to the applied
      part of the replication log, when we cache a master, the successive
      PSYNC2 request will be made in order to *include* the transaction that
      was not completely processed. This means that we need to discard any
      pending transaction from our replication buffer: it will be re-executed.
      189a12af
    • A
      Fix PSYNC2 incomplete command bug as described in #3899. · 22be435e
      antirez 提交于
      This bug was discovered by @kevinmcgehee and constituted a major hidden
      bug in the PSYNC2 implementation, caused by the propagation from the
      master of incomplete commands to slaves.
      
      The bug had several results:
      
      1. Borrowing from Kevin text in the issue: "Given that slaves blindly
      copy over their master's input into their own replication backlog over
      successive read syscalls, it's possible that with large commands or
      small TCP buffers, partial commands are present in this buffer. If the
      master were to fail before successfully propagating the entire command
      to a slave, the slaves will never execute the partial command (since the
      client is invalidated) but will copy it to replication backlog which may
      relay those invalid bytes to its slaves on PSYNC2, corrupting the
      backlog and possibly other valid commands that follow the failover.
      Simple command boundaries aren't sufficient to capture this, either,
      because in the case of a MULTI/EXEC block, if the master successfully
      propagates a subset of the commands but not the EXEC, then the
      transaction in the backlog becomes corrupt and could corrupt other
      slaves that consume this data."
      
      2. As identified by @yangsiran later, there is another effect of the
      bug. For the same mechanism of the first problem, a slave having another
      slave, could receive a full resynchronization request with an already
      half-applied command in the backlog. Once the RDB is ready, it will be
      sent to the slave, and the replication will continue sending to the
      sub-slave the other half of the command, which is not valid.
      
      The fix, designed by @yangsiran and @antirez, and implemented by
      @antirez, uses a secondary buffer in order to feed the sub-masters and
      update the replication backlog and offsets, only when a given part of
      the query buffer is actually *applied* to the state of the instance,
      that is, when the command gets processed and the command is not pending
      in the Redis transaction buffer because of CLIENT_MULTI state.
      
      Given that now the backlog and offsets representation are in agreement
      with the actual processed commands, both issue 1 and 2 should no longer
      be possible.
      
      Thanks to @kevinmcgehee, @yangsiran and @oranagra for their work in
      identifying and designing a fix for this problem.
      22be435e
  13. 18 4月, 2017 8 次提交
  14. 17 4月, 2017 2 次提交
  15. 15 4月, 2017 1 次提交
    • A
      Cluster: discard pong times in the future. · 271733f4
      antirez 提交于
      However we allow for 500 milliseconds of tolerance, in order to
      avoid often discarding semantically valid info (the node is up)
      because of natural few milliseconds desync among servers even when
      NTP is used.
      
      Note that anyway we should ping the node from time to time regardless and
      discover if it's actually down from our point of view, since no update
      is accepted while we have an active ping on the node.
      
      Related to #3929.
      271733f4