1. 13 2月, 2014 4 次提交
    • A
      Update cached time in rdbLoad() callback. · 3c1672da
      antirez 提交于
      server.unixtime and server.mstime are cached less precise timestamps
      that we use every time we don't need an accurate time representation and
      a syscall would be too slow for the number of calls we require.
      
      Such an example is the initialization and update process of the last
      interaction time with the client, that is used for timeouts.
      
      However rdbLoad() can take some time to load the DB, but at the same
      time it did not updated the time during DB loading. This resulted in the
      bug described in issue #1535, where in the replication process the slave
      loads the DB, creates the redisClient representation of its master, but
      the timestamp is so old that the master, under certain conditions, is
      sensed as already "timed out".
      
      Thanks to @yoav-steinberg and Redis Labs Inc for the bug report and
      analysis.
      3c1672da
    • A
      Log when CONFIG REWRITE goes bad. · 116617c5
      antirez 提交于
      116617c5
    • A
      Test: regression for issue #1549. · 767846dc
      antirez 提交于
      It was verified that reverting the commit that fixes the bug, the test
      no longer passes.
      767846dc
    • A
      Fix script cache bug in the scripting engine. · 14143fbe
      antirez 提交于
      This commit fixes a serious Lua scripting replication issue, described
      by Github issue #1549. The root cause of the problem is that scripts
      were put inside the script cache, assuming that slaves and AOF already
      contained it, even if the scripts sometimes produced no changes in the
      data set, and were not actaully propagated to AOF/slaves.
      
      Example:
      
          eval "if tonumber(KEYS[1]) > 0 then redis.call('incr', 'x') end" 1 0
      
      Then:
      
          evalsha <sha1 step 1 script> 1 0
      
      At this step sha1 of the script is added to the replication script cache
      (the script is marked as known to the slaves) and EVALSHA command is
      transformed to EVAL. However it is not dirty (there is no changes to db),
      so it is not propagated to the slaves. Then the script is called again:
      
          evalsha <sha1 step 1 script> 1 1
      
      At this step master checks that the script already exists in the
      replication script cache and doesn't transform it to EVAL command. It is
      dirty and propagated to the slaves, but they fail to evaluate the script
      as they don't have it in the script cache.
      
      The fix is trivial and just uses the new API to force the propagation of
      the executed command regardless of the dirty state of the data set.
      
      Thank you to @minus-infinity on Github for finding the issue,
      understanding the root cause, and fixing it.
      14143fbe
  2. 12 2月, 2014 2 次提交
    • A
      AOF write error: retry with a frequency of 1 hz. · 0296aab6
      antirez 提交于
      0296aab6
    • A
      AOF: don't abort on write errors unless fsync is 'always'. · dd73a7bf
      antirez 提交于
      A system similar to the RDB write error handling is used, in which when
      we can't write to the AOF file, writes are no longer accepted until we
      are able to write again.
      
      For fsync == always we still abort on errors since there is currently no
      easy way to avoid replying with success to the user otherwise, and this
      would violate the contract with the user of only acknowledging data
      already secured on disk.
      dd73a7bf
  3. 11 2月, 2014 20 次提交
  4. 10 2月, 2014 9 次提交
  5. 08 2月, 2014 2 次提交
  6. 07 2月, 2014 1 次提交
  7. 05 2月, 2014 2 次提交
    • A
      Check for EAGAIN in sendBulkToSlave(). · 970de3e9
      antirez 提交于
      Sometime an osx master with a Linux server over a slow link caused
      a strange error where osx called the writable function for
      the socket but actually apparently there was no room in the socket
      buffer to accept the write: write(2) call returned an EAGAIN error,
      that was not checked, so we considered write(2) == 0 always as a connection
      reset, which was unfortunate since the bulk transfer has to start again.
      
      Also more errors are logged with the WARNING level in the same code path
      now.
      970de3e9
    • A
      Cluster: fixed MF condition in clusterHandleSlaveFailover(). · 04fe000b
      antirez 提交于
      For manual failover we need a manual failover in progress, and that
      mf_can_start is true (master offset received and matched).
      04fe000b