1. 29 6月, 2018 1 次提交
    • A
      Sentinel: add an option to deny online script reconfiguration. · 2fa43ece
      antirez 提交于
      The ability of "SENTINEL SET" to change the reconfiguration script at
      runtime is a problem even in the security model of Redis: any client
      inside the network may set any executable to be ran once a failover is
      triggered.
      
      This option adds protection for this problem: by default the two
      SENTINEL SET subcommands modifying scripts paths are denied. However the
      user is still able to rever that using the Sentinel configuration file
      in order to allow such a feature.
      2fa43ece
  2. 23 5月, 2018 1 次提交
    • A
      Sentinel: fix delay in detecting ODOWN. · 266e6423
      antirez 提交于
      See issue #2819 for details. The gist is that when we want to send INFO
      because we are over the time, we used to send only INFO commands, no
      longer sending PING commands. However if a master fails exactly when we
      are about to send an INFO command, the PING times will result zero
      because the PONG reply was already received, and we'll fail to send more
      PINGs, since we try only to send INFO commands: the failure detector
      will delay until the connection is closed and re-opened for "long
      timeout".
      
      This commit changes the logic so that we can send the three kind of
      messages regardless of the fact we sent another one already in the same
      code path. It could happen that we go over the message limit for the
      link by a few messages, but this is not significant. However now we'll
      not introduce delays in sending commands just because there was
      something else to send at the same time.
      266e6423
  3. 22 2月, 2017 1 次提交
    • A
      Use SipHash hash function to mitigate HashDos attempts. · ba647598
      antirez 提交于
      This change attempts to switch to an hash function which mitigates
      the effects of the HashDoS attack (denial of service attack trying
      to force data structures to worst case behavior) while at the same time
      providing Redis with an hash function that does not expect the input
      data to be word aligned, a condition no longer true now that sds.c
      strings have a varialbe length header.
      
      Note that it is possible sometimes that even using an hash function
      for which collisions cannot be generated without knowing the seed,
      special implementation details or the exposure of the seed in an
      indirect way (for example the ability to add elements to a Set and
      check the return in which Redis returns them with SMEMBERS) may
      make the attacker's life simpler in the process of trying to guess
      the correct seed, however the next step would be to switch to a
      log(N) data structure when too many items in a single bucket are
      detected: this seems like an overkill in the case of Redis.
      
      SPEED REGRESION TESTS:
      
      In order to verify that switching from MurmurHash to SipHash had
      no impact on speed, a set of benchmarks involving fast insertion
      of 5 million of keys were performed.
      
      The result shows Redis with SipHash in high pipelining conditions
      to be about 4% slower compared to using the previous hash function.
      However this could partially be related to the fact that the current
      implementation does not attempt to hash whole words at a time but
      reads single bytes, in order to have an output which is endian-netural
      and at the same time working on systems where unaligned memory accesses
      are a problem.
      
      Further X86 specific optimizations should be tested, the function
      may easily get at the same level of MurMurHash2 if a few optimizations
      are performed.
      ba647598
  4. 14 9月, 2016 1 次提交
  5. 12 9月, 2016 1 次提交
  6. 22 7月, 2016 1 次提交
    • A
      Sentinel: check Slave INFO state more often when disconnected. · 3e9ce38b
      antirez 提交于
      During the initial handshake with the master a slave will report to have
      a very high disconnection time from its master (since technically it was
      disconnected since forever, so the current UNIX time in seconds is
      reported).
      
      However when the slave is connected again the Sentinel may re-scan the
      INFO output again only after 10 seconds, which is a long time. During
      this time Sentinels will consider this instance unable to failover, so
      a useless delay is introduced.
      
      Actaully this hardly happened in the practice because when a slave's
      master is down, the INFO period for slaves changes to 1 second. However
      when a manual failover is attempted immediately after adding slaves
      (like in the case of the Sentinel unit test), this problem may happen.
      
      This commit changes the INFO period to 1 second even in the case the
      slave's master is not down, but the slave reported to be disconnected
      from the master (by publishing, last time we checked, a master
      disconnection time field in INFO).
      
      This change is required as a result of an unrelated change in the
      replication code that adds a small delay in the master-slave first
      synchronization.
      3e9ce38b
  7. 05 7月, 2016 1 次提交
    • A
      Sentinel: fix cross-master Sentinel address update. · c383be3b
      antirez 提交于
      This commit both fixes the crash reported with issue #3364 and
      also properly closes the old links after the Sentinel address for the
      other masters gets updated.
      
      The two problems where:
      
      1. The Sentinel that switched address may not monitor all the masters,
         it is possible that there is no match, and the 'match' variable is
         NULL. Now we check for no match and 'continue' to the next master.
      
      2. By ispecting the code because of issue "1" I noticed that there was a
         problem in the code that disconnects the link of the Sentinel that
         needs the address update. Basically link->disconnected is non-zero
         even if just *a single link* (cc -- command link or pc -- pubsub
         link) are disconnected, so to check with if (link->disconnected)
         in order to close the links risks to leave one link connected.
      
      I was able to manually reproduce the crash at "1" and verify that the
      commit resolves the issue.
      
      Close #3364.
      c383be3b
  8. 17 6月, 2016 1 次提交
    • A
      Fix Sentinel pending commands counting. · f7351f4c
      antirez 提交于
      This bug most experienced effect was an inability of Redis to
      reconfigure back old masters to slaves after they are reachable again
      after a failover. This was due to failing to reset the count of the
      pending commands properly, so the master appeared fovever down.
      
      Was introduced in Redis 3.2 new Sentinel connection sharing feature
      which is a lot more complex than the 3.0 code, but more scalable.
      
      Many thanks to people reporting the issue, and especially to
      @sskorgal for investigating the issue in depth.
      
      Hopefully closes #3285.
      f7351f4c
  9. 10 6月, 2016 2 次提交
  10. 26 5月, 2016 1 次提交
  11. 27 1月, 2016 1 次提交
    • A
      Sentinel: improve handling of known Sentinel instances. · 751b5666
      antirez 提交于
      1. Bug #3035 is fixed (NULL pointer access). This was happening with the
         folling set of conditions:
      
      * For some reason one of the Sentinels, let's call it Sentinel_A, changed ID (reconfigured from scratch), but is as the same address at which it used to be.
      
      * Sentinel_A performs a failover and/or has a newer configuration compared to another Sentinel, that we call, Sentinel_B.
      
      * Sentinel_B receives an HELLO message from Sentinel_A, where the address and/or ID is mismatched, but it is reporting a newer configuration for the master they are both monitoring.
      
      2. Sentinels now must have an ID otherwise they are not loaded nor persisted in the configuration. This allows to have conflicting Sentinels with the same address since now the master->sentinels dictionary is indexed by Sentinel ID.
      
      3. The code now detects if a Sentinel is annoucing itself with an IP/port pair already busy (of another Sentinel). The old Sentinel that had the same port/pair is set as having port 0, that means, the address is invalid. We may discover the right address later via HELLO messages.
      751b5666
  12. 12 1月, 2016 1 次提交
  13. 08 9月, 2015 1 次提交
  14. 29 7月, 2015 1 次提交
  15. 27 7月, 2015 2 次提交
  16. 26 7月, 2015 4 次提交
  17. 24 7月, 2015 1 次提交
  18. 13 6月, 2015 1 次提交
    • A
      Sentinel: fix bug in config rewriting during failover · 821a9866
      antirez 提交于
      We have a check to rewrite the config properly when a failover is in
      progress, in order to add the current (already failed over) master as
      slave, and don't include in the slave list the promoted slave itself.
      
      However there was an issue, the variable with the right address was
      computed but never used when the code was modified, and no tests are
      available for this feature for two reasons:
      
      1. The Sentinel unit test currently does not test Sentinel ability to
      persist its state at all.
      2. It is a very hard to trigger state since it lasts for little time in
      the context of the testing framework.
      
      However this feature should be covered in the test in some way.
      
      The bug was found by @badboy using the clang static analyzer.
      
      Effects of the bug on safety of Sentinel
      ===
      
      This bug results in severe issues in the following case:
      
      1. A Sentinel is elected leader.
      2. During the failover, it persists a wrong config with a known-slave
      entry listing the master address.
      3. The Sentinel crashes and restarts, reading invalid configuration from
      disk.
      4. It sees that the slave now does not obey the logical configuration
      (should replicate from the current master), so it sends a SLAVEOF
      command to the master (since the slave master is the same) creating a
      replication loop (attempt to replicate from itself) which Redis is
      currently unable to detect.
      5. This means that the master is no longer available because of the bug.
      
      However the lack of availability should be only transient (at least
      in my tests, but other states could be possible where the problem
      is not recovered automatically) because:
      
      6. Sentinels treat masters reporting to be slaves as failing.
      7. A new failover is triggered, and a slave is promoted to master.
      
      Bug lifetime
      ===
      
      The bug is there forever. Commit 16237d78 actually tried to fix the bug
      but in the wrong way (the computed variable was never used! My fault).
      So this bug is there basically since the start of Sentinel.
      
      Since the bug is hard to trigger, I remember little reports matching
      this condition, but I remember at least a few. Also in automated tests
      where instances were stopped and restarted multiple times automatically
      I remember hitting this issue, however I was not able to reproduce nor
      to determine with the information I had at the time what was causing the
      issue.
      821a9866
  19. 25 5月, 2015 2 次提交
  20. 22 5月, 2015 1 次提交
  21. 20 5月, 2015 1 次提交
  22. 18 5月, 2015 1 次提交
    • A
      Sentinel: SENTINEL CKQUORUM command · abc65e89
      antirez 提交于
      A way for monitoring systems to check that Sentinel is technically able
      to reach the quorum and failover, using the currently visible Sentinels.
      abc65e89
  23. 15 5月, 2015 1 次提交
  24. 14 5月, 2015 7 次提交
  25. 13 5月, 2015 1 次提交
  26. 12 5月, 2015 3 次提交