1. 25 3月, 2014 14 次提交
    • M
      Add REDIS_MIN_RESERVED_FDS define for open fds · 01fe750c
      Matt Stancliff 提交于
      Also update the original REDIS_EVENTLOOP_FDSET_INCR to
      include REDIS_MIN_RESERVED_FDS. REDIS_EVENTLOOP_FDSET_INCR
      exists to make sure more than (maxclients+RESERVED) entries
      are allocated, but we can only guarantee that if we include
      the current value of REDIS_MIN_RESERVED_FDS as a minimum
      for the INCR size.
      01fe750c
    • M
      Fix infinite loop on startup if ulimit too low · 1e7b9980
      Matt Stancliff 提交于
      Fun fact: rlim_t is an unsigned long long on all platforms.
      
      Continually subtracting from a rlim_t makes it get smaller
      and smaller until it wraps, then you're up to 2^64-1.
      
      This was causing an infinite loop on Redis startup if
      your ulimit was extremely (almost comically) low.
      
      The case of (f > oldlimit) would never be met in a case like:
      
          f = 150
          while (f > 20) f -= 128
      
      Since f is unsigned, it can't go negative and would
      take on values of:
      
          Iteration 1: 150 - 128 => 22
          Iteration 2:  22 - 128 => 18446744073709551510
          Iterations 3-∞: ...
      
      To catch the wraparound, we use the previous value of f
      stored in limit.rlimit_cur.  If we subtract from f and
      get a larger number than the value it had previously,
      we print an error and exit since we don't have enough
      file descriptors to help the user at this point.
      
      Thanks to @bs3g for the inspiration to fix this problem.
      Patches existed from @bs3g at antirez#1227, but I needed to repair a few other
      parts of Redis simultaneously, so I didn't get a chance to use them.
      1e7b9980
    • M
      Improve error handling around setting ulimits · f701a347
      Matt Stancliff 提交于
      The log messages about open file limits have always
      been slightly opaque and confusing.  Here's an attempt to
      fix their wording, detail, and meaning.  Users will have a
      better understanding of how to fix very common problems
      with these reworded messages.
      
      Also, we handle a new error case when maxclients becomes less
      than one, essentially rendering the server unusable.  We
      now exit on startup instead of leaving the user with a server
      unable to handle any connections.
      
      This fixes antirez#356 as well.
      f701a347
    • M
      Replace magic 32 with REDIS_EVENTLOOP_FDSET_INCR · 6f4be459
      Matt Stancliff 提交于
      32 was the additional number of file descriptors Redis
      would reserve when managing a too-low ulimit.  The
      number 32 was in too many places statically, so now
      we use a macro instead that looks more appropriate.
      
      When Redis sets up the server event loop, it uses:
          server.maxclients+REDIS_EVENTLOOP_FDSET_INCR
      
      So, when reserving file descriptors, it makes sense to
      reserve at least REDIS_EVENTLOOP_FDSET_INCR FDs instead
      of only 32.  Currently, REDIS_EVENTLOOP_FDSET_INCR is
      set to 128 in redis.h.
      
      Also, I replaced the static 128 in the while f < old loop
      with REDIS_EVENTLOOP_FDSET_INCR as well, which results
      in no change since it was already 128.
      
      Impact: Users now need at least maxclients+128 as
      their open file limit instead of maxclients+32 to obtain
      actual "maxclients" number of clients.  Redis will carve
      the extra REDIS_EVENTLOOP_FDSET_INCR file descriptors it
      needs out of the "maxclients" range instead of failing
      to start (unless the local ulimit -n is too low to accomidate
      the request).
      6f4be459
    • M
      Fix maxclients error handling · 49b576cb
      Matt Stancliff 提交于
      Everywhere in the Redis code base, maxclients is treated
      as an int with (int)maxclients or `maxclients = atoi(source)`,
      so let's make maxclients an int.
      
      This fixes a bug where someone could specify a negative maxclients
      on startup and it would work (as well as set maxclients very high)
      because:
      
          unsigned int maxclients;
          char *update = "-300";
          maxclients = atoi(update);
          if (maxclients < 1) goto fail;
      
      But, (maxclients < 1) can only catch the case when maxclients
      is exactly 0.  maxclients happily sets itself to -300, which isn't
      -300, but rather 4294966996, which isn't < 1, so... everything
      "worked."
      
      maxclients config parsing checks for the case of < 1, but maxclients
      CONFIG SET parsing was checking for case of < 0 (allowing
      maxclients to be set to 0).  CONFIG SET parsing is now updated to
      match config parsing of < 1.
      
      It's tempting to add a MINIMUM_CLIENTS define, but... I didn't.
      
      These changes were inspired by antirez#356, but this doesn't
      fix that issue.
      49b576cb
    • A
      Fixed undefined variable value with certain code paths. · 317ec182
      antirez 提交于
      In sentinelFlushConfig() fd could be undefined when the following if
      statement was true:
      
              if (rewrite_status == -1) goto werr;
      
      This could cause random file descriptors to get closed.
      317ec182
    • M
      Use LRU_CLOCK() instead of function getLRUClock() · 19ecf7cd
      Matt Stancliff 提交于
      lookupKey() uses LRU_CLOCK(), so it seems object
      creation should use LRU_CLOCK() too.
      19ecf7cd
    • M
      Sentinel: Notify user when config can't be saved · 73c2bcca
      Matt Stancliff 提交于
      73c2bcca
    • M
      Fix data loss when save AOF/RDB with no free space · 65e04528
      Matt Stancliff 提交于
      Previously, the (!fp) would only catch lack of free space
      under OS X.  Linux waits to discover it can't write until
      it actually writes contents to disk.
      
      (fwrite() returns success even if the underlying file
      has no free space to write into.  All the errors
      only show up at flush/sync/close time.)
      
      Fixes antirez/redis#1604
      65e04528
    • M
      Cluster: Restore proper trib master iteration · 4af72d32
      Matt Stancliff 提交于
      This got removed in 2e5c394f during a new feature addition.
      
      The prior commit had "break if masters.length == masters_count"
      but we are guaranteed to aready have that condition met since
      otherwise we would haven't gotten this far.
      
      Without this break statement, it's possible some masters may
      be forgotten and have zero replicas while other masters have
      more than their requested number of replicas.
      
      Thanks to carlos for pointing out this regression at:
      https://groups.google.com/forum/#!topic/redis-db/_WVVqDw5B7c
      4af72d32
    • M
      Cluster: Fix trib create when masters==replicas · ee18135a
      Matt Stancliff 提交于
      This bug was introduced in 2e5c394f during a refactor.
      
      It took me a while to understand what was going on with
      the code, so I've refactored it further by:
        - Replacing boolean values with meaningful symbols
        - Replacing 'i' with a meaningful variable name
        - Adding the proper abort check
        - Factoring out now duplicated conditionals
        - Adding optional verbose logging (we're inside *four*
          different looping constructs, so it takes a while to
          figure out where all the moving parts are)
        - Updating comment for the section
      
      This fixes a problem when the number of master instances
      equaled the number of replica instances.  Before, when
      there were equal numbers of both, nodes_count would go to
      zero, but the while loop would spin in i < @replicas because
      i would never be updated (because the nodes_list of each ip
      was length == 0, which triggered an endless loop of
      next -> i = 0 -> 0 < 1? -> true -> next -> i = 0 ...)
      
      Thanks to carlo who found this problem at:
      https://groups.google.com/forum/#!topic/redis-db/_WVVqDw5B7c
      ee18135a
    • J
      Fixed a few typos. · 9fa96697
      Jan-Erik Rediger 提交于
      9fa96697
    • M
      Cluster: remove variable causing warning · a7478079
      Matt Stancliff 提交于
      GCC-4.9 warned about this, but clang didn't.
      
      This commit fixes warning:
      sentinel.c: In function 'sentinelReceiveHelloMessages':
      sentinel.c:2156:43: warning: variable 'master' set but not used [-Wunused-but-set-variable]
           sentinelRedisInstance *ri = c->data, *master;
      a7478079
    • J
      Finally fix the `install_server.sh` script. · 46904ae2
      Jan-Erik Rediger 提交于
      Includes changes from a dozen bug reports and pull requests.
      Was tested on Ubuntu, Debian and CentOS.
      46904ae2
  2. 24 3月, 2014 4 次提交
    • A
      Sample and cache RSS in serverCron(). · 2dd8c462
      antirez 提交于
      Obtaining the RSS (Resident Set Size) info is slow in Linux and OSX.
      This slowed down the generation of the INFO 'memory' section.
      
      Since the RSS does not require to be a real-time measurement, we
      now sample it with server.hz frequency (10 times per second by default)
      and use this value both to show the INFO rss field and to compute the
      fragmentation ratio.
      
      Practically this does not make any difference for memory profiling of
      Redis but speeds up the INFO call significantly.
      2dd8c462
    • A
      sdscatvprintf(): Try to use a static buffer. · ae211be9
      antirez 提交于
      For small content the function now tries to use a static buffer to avoid
      a malloc/free cycle that is too costly when the function is used in the
      context of performance critical code path such as INFO output generation.
      
      This change was verified to have positive effects in the execution speed
      of the INFO command.
      ae211be9
    • A
      Cache uname() output across INFO calls. · 571d6b01
      antirez 提交于
      Uname was profiled to be a slow syscall. It produces always the same
      output in the context of a single execution of Redis, so calling it at
      every INFO output generation does not make too much sense.
      
      The uname utsname structure was modified as a static variable. At the
      same time a static integer was added to check if we need to call uname
      the first time.
      571d6b01
    • A
      sdscatvprintf(): guess buflen using format length. · 06b2a389
      antirez 提交于
      sdscatvprintf() uses a loop where it tries to output the formatted
      string in a buffer of the initial length, if there was not enough room,
      a buffer of doubled size is tried and so forth.
      
      The initial guess for the buffer length was very poor, an hardcoded
      "16". This caused the printf to be processed multiple times without a
      good reason. Given that printf functions are already not fast, the
      overhead was significant.
      
      The new heuristic is to use a buffer 4 times the length of the format
      buffer, and 32 as minimal size. This appears to be a good balance for
      typical uses of the function inside the Redis code base.
      
      This change improved INFO command performances 3 times.
      06b2a389
  3. 21 3月, 2014 22 次提交
    • A
      Add test-lru.rb to utils. · fd70c68c
      antirez 提交于
      This is a program useful to evaluate the Redis LRU algorithm behavior.
      fd70c68c
    • A
      Use getLRUClock() instead of server.lruclock to create objects. · 24c35024
      antirez 提交于
      Thanks to Matt Stancliff for noticing this error. It was in the original
      code but somehow I managed to remove the change from the commit...
      24c35024
    • A
      The default maxmemory policy is now noeviction. · 7f274e3b
      antirez 提交于
      This is safer as by default maxmemory should just set a memory limit
      without any key to be deleted, unless the policy is set to something
      more relaxed.
      7f274e3b
    • A
      Use 24 bits for the lru object field and improve resolution. · a219339e
      antirez 提交于
      There were 2 spare bits inside the Redis object structure that are now
      used in order to enlarge 4x the range of the LRU field.
      
      At the same time the resolution was improved from 10 to 1 second: this
      still provides 194 days before the LRU counter overflows (restarting from
      zero).
      
      This is not a problem since it only causes lack of eviction precision for
      objects not touched for a very long time, and the lack of precision is
      only temporary.
      a219339e
    • A
      Default LRU samples is now 5. · 9928b531
      antirez 提交于
      9928b531
    • A
      Use new dictGetRandomKeys() API to get samples for eviction. · fd5e8c01
      antirez 提交于
      The eviction quality degradates a bit in my tests, but since the API is
      faster, it allows to raise the number of samples, and overall is a win.
      fd5e8c01
    • A
      struct dictEntry -> dictEntry. · 10c8d862
      antirez 提交于
      10c8d862
    • A
      Added dictGetRandomKeys() to dict.c: mass get random entries. · 26292670
      antirez 提交于
      This new function is useful to get a number of random entries from an
      hash table when we just need to do some sampling without particularly
      good distribution.
      
      It just jumps at a random place of the hash table and returns the first
      N items encountered by scanning linearly.
      
      The main usefulness of this function is to speedup Redis internal
      sampling of the key space, for example for key eviction or expiry.
      26292670
    • A
      LRU eviction pool implementation. · c641074a
      antirez 提交于
      This is an improvement over the previous eviction algorithm where we use
      an eviction pool that is persistent across evictions of keys, and gets
      populated with the best candidates for evictions found so far.
      
      It allows to approximate LRU eviction at a given number of samples
      better than the previous algorithm used.
      c641074a
    • A
      Fix OBJECT IDLETIME return value converting to seconds. · c9ac817c
      antirez 提交于
      estimateObjectIdleTime() returns a value in milliseconds now, so we need
      to scale the output of OBJECT IDLETIME to seconds.
      c9ac817c
    • A
      Obtain LRU clock in a resolution dependent way. · 205c2ccc
      antirez 提交于
      For testing purposes it is handy to have a very high resolution of the
      LRU clock, so that it is possible to experiment with scripts running in
      just a few seconds how the eviction algorithms works.
      
      This commit allows Redis to use the cached LRU clock, or a value
      computed on demand, depending on the resolution. So normally we have the
      good performance of a precomputed value, and a clock that wraps in many
      days using the normal resolution, but if needed, changing a define will
      switch behavior to an high resolution LRU clock.
      205c2ccc
    • A
      Specify lruclock in redisServer structure via REDIS_LRU_BITS. · 561e7934
      antirez 提交于
      The padding field was totally useless: removed.
      561e7934
    • A
      Specify LRU resolution in milliseconds. · 8f0b7491
      antirez 提交于
      8f0b7491
    • A
      Set LRU parameters via REDIS_LRU_BITS define. · 63aacbe8
      antirez 提交于
      63aacbe8
    • A
      Unify stats reset for CONFIG RESETSTAT / initServer(). · 8b6a674a
      antirez 提交于
      Now CONFIG RESETSTAT makes sure to reset all the fields, and in the
      future it will be simpler to avoid missing new fields.
      8b6a674a
    • A
      Sentinel: sentinelRefreshInstanceInfo() minor refactoring. · 128dcee4
      antirez 提交于
      Test sentinel.tilt condition on top and return if it is true.
      This allows to remove the check for the tilt condition in the remaining
      code paths of the function.
      128dcee4
    • A
      Sentinel test: 02 unit better coverage + refactoring. · b104015f
      antirez 提交于
      b104015f
    • A
      Sentinel test: foreach_instance_id implements 'break'. · f5f281f9
      antirez 提交于
      f5f281f9
    • A
      Sentinel: instance_is_killed proc added to sentinel.tcl. · f308f677
      antirez 提交于
      f308f677
    • A
      0774d492
    • A
      Sentinel: down-after-milliseconds is not master-specific. · a86e24de
      antirez 提交于
      addReplySentinelRedisInstance() modified so that this field is displayed
      for all the kind of instances: Sentinels, Masters, Slaves.
      a86e24de
    • A
      Sentinel failure detection implementation improved. · 9997b51f
      antirez 提交于
      Failure detection in Sentinel is ping-pong based. It used to work by
      remembering the last time a valid PONG reply was received, and checking
      if the reception time was too old compared to the current current time.
      
      PINGs were sent at a fixed interval of 1 second.
      
      This works in a decent way, but does not scale well when we want to set
      very small values of "down-after-milliseconds" (this is the node
      timeout basically).
      
      This commit reiplements the failure detection making a number of
      changes. Some changes are inspired to Redis Cluster failure detection
      code:
      
      * A new last_ping_time field is added in representation of instances.
        If non zero, we have an active ping that was sent at the specified
        time. When a valid reply to ping is received, the field is zeroed
        again.
      * last_ping_time is not reset when we reconnect the link or send a new
        ping, so from our point of view it represents the time we started
        waiting for the instance to reply to our pings without receiving a
        reply.
      * last_ping_time is now used in order to check if the instance is
        timed out. This means that we can have a node timeout of 100
        milliseconds and yet the system will work well since the new check is
        not bound to the period used to send pings.
      * Pings are now sent every second, or often if the value of
        down-after-milliseconds is less than one second. With a lower limit of
        10 HZ ping frequency.
      * Link reconnection code was improved. This is used in order to try to
        reconnect the link when we are at 50% of the node timeout without a
        valid reply received yet. However the old code triggered unnecessary
        reconnections when the node timeout was very small. Now that should be
        ok.
      
      The new code passes the tests but more testing is needed and more unit
      tests stressing the failure detector, so currently this is merged only
      in the unstable branch.
      9997b51f