diff --git a/TODO b/TODO index ca2784be348d9ef5821084d0ee507c9a9154ab7d..0c48289eacc2337d0916d970cddfaa744d2a9cfd 100644 --- a/TODO +++ b/TODO @@ -1,28 +1,28 @@ BEFORE REDIS 1.0.0-rc1 -- SDIFF, SDIFFSTORE -- Add number of keys for every DB in INFO -- maxmemory support -- maxclients support -- Resize the expires and Sets hash tables if needed as well? For Sets the right moment to check for this is probably in SREM -- TTL command that returns -1 if a key is not volatile otherwise the time to live of a volatile key in seconds. -- What happens if the saving child gets killed or segfaults instead of ending normally? Handle this. -- Make sinterstore / unionstore / sdiffstore returning the cardinality of the resulting set. -- check 'server.dirty' everywere -- Shutdown must kill other background savings before to start saving. Otherwise the DB can get replaced by the child that rename(2) after the parent for some reason. Child should trap the signal and remove the temp file name. -- Objects sharing configuration, add the directive "objectsharingpool " -- Make sure to convert all the fstat() calls to 64bit versions. -- SINTERCOUNT, SUNIONCOUNT, SDIFFCOUNT + * SDIFF, SDIFFSTORE + * Add number of keys for every DB in INFO + * maxmemory support + * maxclients support + * Resize the expires and Sets hash tables if needed as well? For Sets the right moment to check for this is probably in SREM + * TTL command that returns -1 if a key is not volatile otherwise the time to live of a volatile key in seconds. + * What happens if the saving child gets killed or segfaults instead of ending normally? Handle this. + * Make sinterstore / unionstore / sdiffstore returning the cardinality of the resulting set. + * check 'server.dirty' everywere + * Shutdown must kill other background savings before to start saving. Otherwise the DB can get replaced by the child that rename(2) after the parent for some reason. Child should trap the signal and remove the temp file name. + * Objects sharing configuration, add the directive `objectsharingpool ` + * Make sure to convert all the fstat() calls to 64bit versions. + * Cover most of the source code with test-redis.tcl AFTER 1.0 stable release -- Use partial qsort for SORT + LIMIT. Don't copy the list into a vector when BY argument is constant. -- Locking primitives -- MDEL (or vararg DEL) -- Write the hash table size of every db in the dump, so that Redis can resize the hash table just one time when loading a big DB. -- Elapsed time in logs for SAVE when saving is going to take more than 2 seconds -- replication automated tests -- LOCK / TRYLOCK / UNLOCK as described many times in the google group + * Consistent hashing implemented in all the client libraries having an user base + * Use partial qsort for SORT + LIMIT. Don't copy the list into a vector when BY argument is constant. + * Profiling and optimization in order to limit the CPU usage at minimum + * Write the hash table size of every db in the dump, so that Redis can resize the hash table just one time when loading a big DB. + * Elapsed time in logs for SAVE when saving is going to take more than 2 seconds + * LOCK / TRYLOCK / UNLOCK as described many times in the google group + * Replication automated tests FUTURE HINTS diff --git a/doc/FAQ.html b/doc/FAQ.html index 348351738909a61c71b4464e6534c2b387157d7a..ee487314955d2c68595db450eff4123f2ba4d277 100644 --- a/doc/FAQ.html +++ b/doc/FAQ.html @@ -16,7 +16,7 @@
-FAQ: Contents
  Why I need Redis if there is already memcachedb, Tokyo Cabinet, ...?
  Isn't this key-value thing just hype?
  Can I backup a Redis DB while the server is working?
  What's the Redis memory footprint?
  I like Redis high level operations and features, but I don't like it takes everything in memory and I can't have a dataset larger the memory. Plans to change this?
  Ok but I absolutely need to have a DB larger than memory, still I need the Redis features
  I have an empty Redis server but INFO and logs are reporting megabytes of memory in use!
  What happens if Redis runs out of memory?
  How much time it takes to load a big database at server startup?
  Redis is single threaded, how can I exploit multiple CPU / cores?
  I'm using some form of key hashing for partitioning, but what about SORT BY?
  What is the maximum number of keys a single Redis instance can hold?
  What Redis means actually?
  Why did you started the Redis project? +FAQ: Contents
  Why I need Redis if there is already memcachedb, Tokyo Cabinet, ...?
  Isn't this key-value thing just hype?
  Can I backup a Redis DB while the server is working?
  What's the Redis memory footprint?
  I like Redis high level operations and features, but I don't like it takes everything in memory and I can't have a dataset larger the memory. Plans to change this?
  Ok but I absolutely need to have a DB larger than memory, still I need the Redis features
  I have an empty Redis server but INFO and logs are reporting megabytes of memory in use!
  What happens if Redis runs out of memory?
  How much time it takes to load a big database at server startup?
  Background saving is failing with a fork() error under Linux even if I've a lot of free RAM!
  Redis is single threaded, how can I exploit multiple CPU / cores?
  I'm using some form of key hashing for partitioning, but what about SORT BY?
  What is the maximum number of keys a single Redis instance can hold?
  What Redis means actually?
  Why did you started the Redis project?

FAQ

@@ -34,10 +34,9 @@ So Redis offers more features:

  • Keys can store different data t
    • We wrote a simple Twitter Clone using just Redis as database. Download the source code from the download section and imagine to write it with a plain key-value DB without support for lists and sets... it's much harder.
    • Multiple DBs. Using the SELECT command the client can select different datasets. This is useful because Redis provides a MOVE atomic primitive that moves a key form a DB to another one, if the target DB already contains such a key it returns an error: this basically means a way to perform locking in distributed processing.
    • So what is Redis really about? The User interface with the programmer. Redis aims to export to the programmer the right tools to model a wide range of problems. Sets, Lists with O(1) push operation, lrange and ltrim, server-side fast intersection between sets, are primitives that allow to model complex problems with a key value database.
    -

    Isn't this key-value thing just hype?

    I imagine key-value DBs, in the short term future, to be used like you use memory in a program, with lists, hashes, and so on. With Redis it's like this, but this special kind of memory containing your data structures is shared, atomic, persistent.

    When we write code it is obvious, when we take data in memory, to use the most sensible data structure for the work, right? Incredibly when data is put inside a relational DB this is no longer true, and we create an absurd data model even if our need is to put data and get this data back in the same order we put it inside (an ORDER BY is required when the data should be already sorted. Strange, dont' you think?).

    Key-value DBs bring this back at home, to create sensible data models and use the right data structures for the problem we are trying to solve.

    Can I backup a Redis DB while the server is working?

    Yes you can. When Redis saves the DB it actually creates a temp file, then rename(2) that temp file name to the destination file name. So even while the server is working it is safe to save the database file just with the cp unix command. Note that you can use master-slave replication in order to have redundancy of data, but if all you need is backups, cp or scp will do the work pretty well.

    What's the Redis memory footprint?

    Worst case scenario: 1 Million keys with the key being the natural numbers from 0 to 999999 and the string "Hello World" as value use 100MB on my Intel macbook (32bit). Note that the same data stored linearly in an unique string takes something like 16MB, this is the norm because with small keys and values there is a lot of overhead. Memcached will perform similarly.

    With large keys/values the ratio is much better of course.

    64 bit systems will use much more memory than 32 bit systems to store the same keys, especially if the keys and values are small, this is because pointers takes 8 bytes in 64 bit systems. But of course the advantage is that you can have a lot of memory in 64 bit systems, so to run large Redis servers a 64 bit system is more or less required.

    I like Redis high level operations and features, but I don't like it takes everything in memory and I can't have a dataset larger the memory. Plans to change this?

    The whole key-value hype started for a reason: performances. Redis takes the whole dataset in memory and writes asynchronously on disk in order to be very fast, you have the best of both worlds: hyper-speed and persistence of data, but the price to pay is exactly this, that the dataset must fit on your computers RAM.

    If the data is larger then memory, and this data is stored on disk, what happens is that the bottleneck of the disk I/O speed will start to ruin the performances. Maybe not in benchmarks, but once you have real load from multiple clients with distributed key accesses the data must come from disk, and the disk is damn slow. Not only, but Redis supports higher level data structures than the plain values. To implement this things on disk is even slower.

    Redis will always continue to hold the whole dataset in memory because this days scalability requires to use RAM as storage media, and RAM is getting cheaper and cheaper. Today it is common for an entry level server to have 16 GB of RAM! And in the 64-bit era there are no longer limits to the amount of RAM you can have in theory.

    Ok but I absolutely need to have a DB larger than memory, still I need the Redis features

    You may try to load a dataset larger than your memory in Redis and see what happens, basically if you are using a modern Operating System, and you have a lot of data in the DB that is rarely accessed, the OS's virtual memory implementation will try to swap rarely used pages of memory on the disk, to only recall this pages when they are needed. If you have many large values rarely used this will work. If your DB is big because you have tons of little values accessed at random without a specific pattern this will not work (at low level a page is usually 4096 bytes, and you can have different keys/values stored at a single page. The OS can't swap this page on disk if there are even few keys used frequently).

    Another possible solution is to use both MySQL and Redis at the same time, basically take the state on Redis, and all the things that get accessed very frequently: user auth tokens, Redis Lists with chronologically ordered IDs of the last N-comments, N-posts, and so on. Then use MySQL as a simple storage engine for larger data, that is just create a table with an auto-incrementing ID as primary key and a large BLOB field as data field. Access MySQL data only by primary key (the ID). The application will run the high traffic queries against Redis but when there is to take the big data will ask MySQL for specific resources IDs.

    I have an empty Redis server but INFO and logs are reporting megabytes of memory in use!

    This may happen and it's prefectly ok. Redis objects are small C structures allocated and freed a lot of times. This costs a lot of CPU so instead of being freed, released objects are taken into a free list and reused when needed. This memory is taken exactly by this free objects ready to be reused.

    What happens if Redis runs out of memory?

    With modern operating systems malloc() returning NULL is not common, usually the server will start swapping and Redis performances will be disastrous so you'll know it's time to use more Redis servers or get more RAM.

    However it is planned to add a configuration directive to tell Redis to stop accepting queries but instead to SAVE the latest data and quit if it is using more than a given amount of memory. Also the new INFO command (work in progress in this days) will report the amount of memory Redis is using so you can write scripts that monitor your Redis servers checking for critical conditions.

    Update: redis SVN is able to know how much memory it is using and report it via the INFO command.

    How much time it takes to load a big database at server startup?

    Just an example on normal hardware: It takes about 45 seconds to restore a 2 GB database on a fairly standard system, no RAID. This can give you some kind of feeling about the order of magnitude of the time needed to load data when you restart the server.

    Redis is single threaded, how can I exploit multiple CPU / cores?

    Simply start multiple instances of Redis in different ports in the same box and threat them as different servers! Given that Redis is a distributed database anyway in order to scale you need to think in terms of multiple computational units. At some point a single box may not be enough anyway.

    In general key-value databases are very scalable because of the property that different keys can stay on different servers independently.

    In Redis there are client libraries such Redis-rb (the Ruby client) that are able to handle multiple servers automatically using consistent hashing. We are going to implement consistent hashing in all the other major client libraries. If you use a different language you can implement it yourself otherwise just hash the key before to SET / GET it from a given server. For example imagine to have N Redis servers, server-0, server-1, ..., server-N. You want to store the key "foo", what's the right server where to put "foo" in order to distribute keys evenly among different servers? Just perform the crc = CRC32("foo"), then servernum = crc % N (the rest of the division for N). This will give a number between 0 and N-1 for every key. Connect to this server and store the key. The same for gets.

    This is a basic way of performing key partitioning, consistent hashing is much better and this is why after Redis 1.0 will be released we'll try to implement this in every widely used client library starting from Python and PHP (Ruby already implements this support).

    I'm using some form of key hashing for partitioning, but what about SORT BY?

    With SORT BY you need that all the weight keys are in the same Redis instance of the list/set you are trying to sort. In order to make this possible we developed a concept called key tags. A key tag is a special pattern inside a key that, if preset, is the only part of the key hashed in order to select the server for this key. For example in order to hash the key "foo" I simply perform the CRC32 checksum of the whole string, but if this key has a pattern in the form of the characters {...} I only hash this substring. So for example for the key "foo{bared}" the key hashing code will simply perform the CRC32 of "bared". This way using key tags you can ensure that related keys will be stored on the same Redis instance just using the same key tag for all this keys. Redis-rb already implements key tags.

    What is the maximum number of keys a single Redis instance can hold?

    The latest versions of Redis in the Git repository are able to handle at least 150 million of keys per instance. We are working in order to experiment with larger values.

    What Redis means actually?

    Redis means two things: +

    Isn't this key-value thing just hype?

    I imagine key-value DBs, in the short term future, to be used like you use memory in a program, with lists, hashes, and so on. With Redis it's like this, but this special kind of memory containing your data structures is shared, atomic, persistent.

    When we write code it is obvious, when we take data in memory, to use the most sensible data structure for the work, right? Incredibly when data is put inside a relational DB this is no longer true, and we create an absurd data model even if our need is to put data and get this data back in the same order we put it inside (an ORDER BY is required when the data should be already sorted. Strange, dont' you think?).

    Key-value DBs bring this back at home, to create sensible data models and use the right data structures for the problem we are trying to solve.

    Can I backup a Redis DB while the server is working?

    Yes you can. When Redis saves the DB it actually creates a temp file, then rename(2) that temp file name to the destination file name. So even while the server is working it is safe to save the database file just with the cp unix command. Note that you can use master-slave replication in order to have redundancy of data, but if all you need is backups, cp or scp will do the work pretty well.

    What's the Redis memory footprint?

    Worst case scenario: 1 Million keys with the key being the natural numbers from 0 to 999999 and the string "Hello World" as value use 100MB on my Intel macbook (32bit). Note that the same data stored linearly in an unique string takes something like 16MB, this is the norm because with small keys and values there is a lot of overhead. Memcached will perform similarly.

    With large keys/values the ratio is much better of course.

    64 bit systems will use much more memory than 32 bit systems to store the same keys, especially if the keys and values are small, this is because pointers takes 8 bytes in 64 bit systems. But of course the advantage is that you can have a lot of memory in 64 bit systems, so to run large Redis servers a 64 bit system is more or less required.

    I like Redis high level operations and features, but I don't like it takes everything in memory and I can't have a dataset larger the memory. Plans to change this?

    The whole key-value hype started for a reason: performances. Redis takes the whole dataset in memory and writes asynchronously on disk in order to be very fast, you have the best of both worlds: hyper-speed and persistence of data, but the price to pay is exactly this, that the dataset must fit on your computers RAM.

    If the data is larger then memory, and this data is stored on disk, what happens is that the bottleneck of the disk I/O speed will start to ruin the performances. Maybe not in benchmarks, but once you have real load from multiple clients with distributed key accesses the data must come from disk, and the disk is damn slow. Not only, but Redis supports higher level data structures than the plain values. To implement this things on disk is even slower.

    Redis will always continue to hold the whole dataset in memory because this days scalability requires to use RAM as storage media, and RAM is getting cheaper and cheaper. Today it is common for an entry level server to have 16 GB of RAM! And in the 64-bit era there are no longer limits to the amount of RAM you can have in theory.

    Ok but I absolutely need to have a DB larger than memory, still I need the Redis features

    You may try to load a dataset larger than your memory in Redis and see what happens, basically if you are using a modern Operating System, and you have a lot of data in the DB that is rarely accessed, the OS's virtual memory implementation will try to swap rarely used pages of memory on the disk, to only recall this pages when they are needed. If you have many large values rarely used this will work. If your DB is big because you have tons of little values accessed at random without a specific pattern this will not work (at low level a page is usually 4096 bytes, and you can have different keys/values stored at a single page. The OS can't swap this page on disk if there are even few keys used frequently).

    Another possible solution is to use both MySQL and Redis at the same time, basically take the state on Redis, and all the things that get accessed very frequently: user auth tokens, Redis Lists with chronologically ordered IDs of the last N-comments, N-posts, and so on. Then use MySQL as a simple storage engine for larger data, that is just create a table with an auto-incrementing ID as primary key and a large BLOB field as data field. Access MySQL data only by primary key (the ID). The application will run the high traffic queries against Redis but when there is to take the big data will ask MySQL for specific resources IDs.

    I have an empty Redis server but INFO and logs are reporting megabytes of memory in use!

    This may happen and it's prefectly ok. Redis objects are small C structures allocated and freed a lot of times. This costs a lot of CPU so instead of being freed, released objects are taken into a free list and reused when needed. This memory is taken exactly by this free objects ready to be reused.

    What happens if Redis runs out of memory?

    With modern operating systems malloc() returning NULL is not common, usually the server will start swapping and Redis performances will be disastrous so you'll know it's time to use more Redis servers or get more RAM.

    However it is planned to add a configuration directive to tell Redis to stop accepting queries but instead to SAVE the latest data and quit if it is using more than a given amount of memory. Also the new INFO command (work in progress in this days) will report the amount of memory Redis is using so you can write scripts that monitor your Redis servers checking for critical conditions.

    Update: redis SVN is able to know how much memory it is using and report it via the INFO command.

    How much time it takes to load a big database at server startup?

    Just an example on normal hardware: It takes about 45 seconds to restore a 2 GB database on a fairly standard system, no RAID. This can give you some kind of feeling about the order of magnitude of the time needed to load data when you restart the server.

    Background saving is failing with a fork() error under Linux even if I've a lot of free RAM!

    Short answer: echo 1 > /proc/sys/vm/overcommit_memory :)

    And now the long one:

    Redis background saving schema relies on the copy-on-write semantic of fork in modern operating systems: Redis forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will share the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory all the pages may change while the child process is saving, Linux can't tell in advance how much memory the child will take, so if the overcommit_memory setting is set to zero fork will fail unless there is as much free RAM as required to really duplicate all the parent memory pages, with the result that if you have a Redis dataset of 3 GB and just 2 GB of free memory it will fail.

    Setting overcommit_memory to 1 says Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis.

    Redis is single threaded, how can I exploit multiple CPU / cores?

    Simply start multiple instances of Redis in different ports in the same box and threat them as different servers! Given that Redis is a distributed database anyway in order to scale you need to think in terms of multiple computational units. At some point a single box may not be enough anyway.

    In general key-value databases are very scalable because of the property that different keys can stay on different servers independently.

    In Redis there are client libraries such Redis-rb (the Ruby client) that are able to handle multiple servers automatically using consistent hashing. We are going to implement consistent hashing in all the other major client libraries. If you use a different language you can implement it yourself otherwise just hash the key before to SET / GET it from a given server. For example imagine to have N Redis servers, server-0, server-1, ..., server-N. You want to store the key "foo", what's the right server where to put "foo" in order to distribute keys evenly among different servers? Just perform the crc = CRC32("foo"), then servernum = crc % N (the rest of the division for N). This will give a number between 0 and N-1 for every key. Connect to this server and store the key. The same for gets.

    This is a basic way of performing key partitioning, consistent hashing is much better and this is why after Redis 1.0 will be released we'll try to implement this in every widely used client library starting from Python and PHP (Ruby already implements this support).

    I'm using some form of key hashing for partitioning, but what about SORT BY?

    With SORT BY you need that all the weight keys are in the same Redis instance of the list/set you are trying to sort. In order to make this possible we developed a concept called key tags. A key tag is a special pattern inside a key that, if preset, is the only part of the key hashed in order to select the server for this key. For example in order to hash the key "foo" I simply perform the CRC32 checksum of the whole string, but if this key has a pattern in the form of the characters {...} I only hash this substring. So for example for the key "foo{bared}" the key hashing code will simply perform the CRC32 of "bared". This way using key tags you can ensure that related keys will be stored on the same Redis instance just using the same key tag for all this keys. Redis-rb already implements key tags.

    What is the maximum number of keys a single Redis instance can hold?

    The latest versions of Redis in the Git repository are able to handle at least 150 million of keys per instance. We are working in order to experiment with larger values.

    What Redis means actually?

    Redis means two things:
    • it's a joke on the word Redistribute (instead to use just a Relational DB redistribute your workload among Redis servers)
    • it means REmote DIctionary Server

    Why did you started the Redis project?

    In order to scale LLOOGG. But after I got the basic server working I liked the idea to share the work with other guys, and Redis was turned into an open source project. -
diff --git a/doc/README.html b/doc/README.html index ed6d56ac5b807199fcb93cf304102bab0ddd4eef..090196deacafd544d802f1eeafd98083687ab6e6 100644 --- a/doc/README.html +++ b/doc/README.html @@ -28,23 +28,23 @@

Introduction

Redis is a database. To be more specific redis is a very simple database implementing a dictionary where keys are associated with values. For example -I can set the key "surname_1992" to the string "Smith".

Redis takes the whole dataset in memory, but the dataset is persistent -since from time to time Redis writes a dump of the dataset on disk asynchronously. The dump is loaded every time the server is restarted. This means that if a system crash occurs the last few queries can get lost (that is acceptable in many applications). Redis supports master-slave replication from the early days in order to improve performances and reliability.

Beyond key-value databases

In most key-value databases keys and values are simple strings. In Redis keys are just strings too, but the associated values can be Strings, Lists and Sets, and there are commands to perform complex atomic operations against this data types, so you can think at Redis as a data structures server.

For example you can append elements to a list stored at the key "mylist" using the LPUSH or RPUSH operation in O(1). Later you'll be able to get a range of elements with LRANGE or trim the list with LTRIM. Sets are very flexible too, it is possible to add and remove elements from Sets (unsorted collections of strings), and then ask for server-side intersection, union, difference of Sets.

All this features, the support for sorting Lists and Sets, allow to use Redis as the sole DB for your scalable application without the need of any relational database. We wrote a simple Twitter clone in PHP + Redis to show a real world example, the link points to an article explaining the design and internals in very simple words.

What are the differences between Redis and Memcached?

In the following ways:

  • Memcached is not persistent, it just holds everything in memory without saving since its main goal is to be used as a cache. Redis instead can be used as the main DB for the application. We wrote a simple Twitter clone using only Redis as database.
+I can set the key "surname_1992" to the string "Smith". The interesting thing about Redis is that values associated to keys are not limited to simple strings, they can also be lists and sets, with a number of server-side atomic operations associated to this data types.

Redis takes the whole dataset in memory, but the dataset is persistent +since from time to time Redis writes a dump of the dataset on disk asynchronously. The dump is loaded every time the server is restarted.

Redis can be configured to save the dataset after a given number of seconds elapzed and changes to the data set. For example you can tell Redis to save after 1000 changes and at least 60 seconds sinde the same save. You can specify a number of this combinatins.

Because data is written asynchronously, If a system crash occurs the last few queries can get lost (that is acceptable in many applications). Redis supports master-slave replication from the early days in order to make this a non issue if your application is of the kind where even few lost records are not acceptable.

Beyond key-value databases

In most key-value databases keys and values are simple strings. In Redis keys are just strings too, but the associated values can be Strings, Lists and Sets, and there are commands to perform complex atomic operations against this data types, so you can think at Redis as a data structures server.

For example you can append elements to a list stored at the key "mylist" using the LPUSH or RPUSH operation in O(1). Later you'll be able to get a range of elements with LRANGE or trim the list with LTRIM. Sets are very flexible too, it is possible to add and remove elements from Sets (unsorted collections of strings), and then ask for server-side intersection, union, difference of Sets.

All this features, the support for sorting Lists and Sets, allow to use Redis as the sole DB for your scalable application without the need of any relational database. We wrote a simple Twitter clone in PHP + Redis to show a real world example, the link points to an article explaining the design and internals in very simple words.

What are the differences between Redis and Memcached?

In the following ways:

  • Memcached is not persistent, it just holds everything in memory without saving since its main goal is to be used as a cache. Redis instead can be used as the main DB for the application. We wrote a simple Twitter clone using only Redis as database.
  • Like memcached Redis uses a key-value model, but while keys can just be strings, values in Redis can be lists and sets, and complex operations like intersections, set/get n-th element of lists, pop/push of elements, can be performed against sets and lists. It is possible to use lists as message queues.
-

What are the differences between Redis and Tokyo Cabinet / Tyrant?

Redis and Tokyo Cabinet can be used for the same applications, but actually they are very different beasts:

  • Tokyo Cabinet writes synchronously on disk, Redis takes the whole dataset on memory and writes on disk asynchronously. Tokyo Cabinet is safer, Redis faster (but note that Redis supports master-slave replication that is trivial to setup, so you are safe anyway if you want a setup where data can't be lost even after a disaster).
-
  • Redis supports higher level operations and data structures. While Tokyo Cabinet supports a kind of database that is able to organize data into rows with named fields (in a way very similar to Berkeley DB) can't do things like server side List and Set operations Redis is able to do: pushing or popping from Lists in an atomic way, in O(1) time complexity, server side Set intersections, SortCommand ing of schema free data in complex ways (Btw TC supports sorting in the table-based database format).
+

What are the differences between Redis and Tokyo Cabinet / Tyrant?

Redis and Tokyo Cabinet can be used for the same applications, but actually they are very different beasts. If you read twitter messages of people involved in scalable things both products are reported to work well, but surely there are times where one or the other can be the best choice. Some differences are the followings (I may be biased, make sure to check yourself both the products).

  • Tokyo Cabinet writes synchronously on disk, Redis takes the whole dataset on memory and writes on disk asynchronously. Tokyo Cabinet is safer and probably a better idea if your dataset is going to be bigger than RAM, but Redis is faster (note that Redis supports master-slave replication that is trivial to setup, so you are safe anyway if you want a setup where data can't be lost even after a disaster).
+
  • Redis supports higher level operations and data structures. Tokyo Cabinet supports a kind of database that is able to organize data into rows with named fields (in a way very similar to Berkeley DB) but can't do things like server side List and Set operations Redis is able to do: pushing or popping from Lists in an atomic way, in O(1) time complexity, server side Set intersections, SortCommand ing of schema free data in complex ways (Btw TC supports sorting in the table-based database format). Redis on the other hand does not support the abstraction of tables with fields, the idea is that you can build this stuff in software easily if you really need a table-alike approach.
  • Tokyo Cabinet does not implement a networking layer. You have to use a networking layer called Tokyo Tyrant that interfaces to Tokyo Cabinet so you can talk to Tokyo Cabinet in a client-server fashion. In Redis the networking support is built-in inside the server, and is basically the only interface between the external world and the dataset.
-
  • Redis is reported to be much faster, especially if you plan to access Tokyo Cabinet via Tokyo Tyrant. From the informal numbers I saw around on the net you can expect Redis to be 10 times faster than Tokyo Cabinet + Tyrant.
-
  • Redis is not an on-disk DB engine like Tokyo: the latter can be used as a fast DB engine in your C project without the networking overhead just linking to the library. Still remember that in many scalable applications you need multiple servers talking with multiple clients, so the client-server model is almost always needed.
+
  • Redis is reported to be much faster, especially if you plan to access Tokyo Cabinet via Tokyo Tyrant. Here I can only say that with Redis you can expect 100,000 operations/seconds with a normal Linux box and 50 concurrent clients. You should test Redis, Tokyo, and the other alternatives with your specific work load to get a feeling about performances for your application.
+
  • Redis is not an on-disk DB engine like Tokyo: the latter can be used as a fast DB engine in your C project without the networking overhead just linking to the library. Still in many scalable applications you need multiple servers talking with multiple clients, so the client-server model is almost always needed, this is why in Redis this is built-in.

Does Redis support locking?

No, the idea is to provide atomic primitives in order to make the programmer able to use redis with locking free algorithms. For example imagine you have -10 computers and 1 redis server. You want to count words in a very large text. +10 computers and one Redis server. You want to count words in a very large text. This large text is split among the 10 computers, every computer will process its part and use Redis's INCR command to atomically increment a counter for every occurrence of the word found.

INCR/DECR are not the only atomic primitives, there are others like PUSH/POP on lists, POP RANDOM KEY operations, UPDATE and so on. For example you can use Redis like a Tuple Space (http://en.wikipedia.org/wiki/Tuple_space) in -order to implement distributed algorithms.

(News: locking with key-granularity is now planned)

Multiple databases support

Another synchronization primitive is the support for multiple DBs. By default DB 0 is selected for every new connection, but using the SELECT command it is possible to select a different database. The MOVE operation can move an item from one DB to another atomically. This can be used as a base for locking free algorithms together with the 'RANDOMKEY' or 'POPRANDOMKEY' commands.

Redis Data Types

Redis supports the following three data types as values:

  • Strings: just any sequence of bytes. Redis strings are binary safe so they can not just hold text, but images, compressed data and everything else.
  • Lists: lists of strings, with support for operations like append a new string on head, on tail, list length, obtain a range of elements, truncate the list to a given length, sort the list, and so on.
  • Sets: an unsorted set of strings. It is possible to add or delete elements from a set, to perform set intersection, union, subtraction, and so on.
+order to implement distributed algorithms.

(News: locking with key-granularity is now planned)

Multiple databases support

Another synchronization primitive is the support for multiple DBs. By default DB 0 is selected for every new connection, but using the SELECT command it is possible to select a different database. The MOVE operation can move an item from one DB to another atomically. This can be used as a base for locking free algorithms together with the 'RANDOMKEY' commands.

Redis Data Types

Redis supports the following three data types as values:

  • Strings: just any sequence of bytes. Redis strings are binary safe so they can not just hold text, but images, compressed data and everything else.
  • Lists: lists of strings, with support for operations like append a new string on head, on tail, list length, obtain a range of elements, truncate the list to a given length, sort the list, and so on.
  • Sets: an unsorted set of strings. It is possible to add or delete elements from a set, to perform set intersection, union, subtraction, and so on.
Values can be Strings, Lists or Sets. Keys can be a subset of strings not containing newlines ("\n") and spaces (" ").

Note that sometimes strings may hold numeric vaules that must be parsed by Redis. An example is the INCR command that atomically increments the number stored at the specified key. In this case Redis is able to handle integers