To achieve the purpose of high performance data writing and querying, TDengine employs a lot of caching technologies in both server side and client side.
## Write Cache
The cache management policy in TDengine is First-In-First-Out (FIFO). FIFO is also known as insert driven cache management policy and it is different from read driven cache management, which is more commonly known as Least-Recently-Used (LRU). FIFO simply stores the latest data in cache and flushes the oldest data in cache to disk, when the cache usage reaches a threshold. In IoT use cases, it is the current state i.e. the latest or most recent data that is important. The cache policy in TDengine, like much of the design and architecture of TDengine, is based on the nature of IoT data.
Caching the latest data provides the capability of retrieving data in milliseconds. With this capability, TDengine can be configured properly to be used as a caching system without deploying another separate caching system. This simplifies the system architecture and minimizes operational costs. The cache is emptied after TDengine is restarted. TDengine does not reload data from disk into cache, like a key-value caching system.
The memory space used by each vnode as write cache is determined when creating a database. Parameter `vgroups` and `buffer` can be used to specify the number of vnode and the size of write cache for each vnode when creating the database. Then, the total size of write cache for this database is `vgroups * buffer`.
```sql
createdatabasedb0vgroups100buffer16MB
```
The above statement creates a database of 100 vnodes while each vnode has a write cache of 16MB.
Even though in theory it's always better to have a larger cache, the extra effect would be very minor once the size of cache grows beyond a threshold. So normally it's enough to use the default value of `buffer` parameter.
## Read Cache
The memory space used by the TDengine cache is fixed in size and configurable. It should be allocated based on application requirements and system resources. An independent memory pool is allocated for and managed by each vnode (virtual node) in TDengine. There is no sharing of memory pools between vnodes. All the tables belonging to a vnode share all the cache memory of the vnode.
When creating a database, it's also possible to specify whether to cache the latest data of each sub table, using parameter `cachelast`. There are 3 cases:
- 0: No cache for latest data
- 1: The last row of each table is cached, `last_row` function can benefit significantly from it
- 2: The latest non-NULL value of each column for each table is cached, `last` function can benefit very much when there is no `where`, `group by`, `order by` or `interval` clause
- 3: Bot hthe last row and the latest non-NULL value of each column for each table are cached, identical to the behavior of both 1 and 2 are set together
The memory pool is divided into blocks and data is stored in row format in memory and each block follows FIFO policy. The size of each block is determined by configuration parameter `cache` and the number of blocks for each vnode is determined by the parameter `blocks`. For each vnode, the total cache size is `cache * blocks`. A cache block needs to ensure that each table can store at least dozens of records, to be efficient.
`last_row` function can be used to retrieve the last row of a table or a STable to quickly show the current state of devices on monitoring screen. For example the below SQL statement retrieves the latest voltage of all meters in San Francisco, California.
## Meta Cache
To process data writing and querying efficiently, each vnode caches the metadata that's already retrieved. Parameters `pages` and `pagesize` are used to specify the size of metadata cache for each vnode.
The above statement will create a database db0 each of whose vnode is allocated a meta cache of `128 * 16 KB = 2 MB` .
## File System Cache
TDengine utilizes WAL to provide basic reliability. The essential of WAL is to append data in a disk file, so the file system cache also plays an important role in the writing performance. Parameter `wal` can be used to specify the policy of writing WAL, there are 2 cases:
- 1: Write data to WAL without calling fsync, the data is actually written to the file system cache without flushing immediately, in this way you can get better write performance
- 2: Write data to WAL and invoke fsync, the data is immediately flushed to disk, in this way you can get higher reliability
## Client Cache
To improve the overall efficiency of processing data, besides the above caches, the core library `libtaos.so` (also referred to as `taosc`) which all client programs depend on also has its own cache. `taosc` caches the metadata of the databases, super tables, tables that the invoking client has accessed, plus other critical metadata such as the cluster topology.
When multiple client programs are accessing a TDengine cluster, if one of the clients modifies some metadata, the cache may become invalid in other clients. If this case happens, the client programs need to "reset query cache" to invalidate the whole cache so that `taosc` is enforced to repull the metadata it needs to rebuild the cache.
sql create table tb1 using st2 tags (1,1.0,"1",1.0,1,1,"1");
sql create table tb2 using st2 tags (2,2.0,"2",2.0,2,2,"2");
sql create table tb3 using st2 tags (3,3.0,"3",3.0,3,3,"3");
...
...
@@ -25,132 +19,102 @@ sql insert into tb1 values (now-100s,2,2.0,2.0,2,2,2,true,"2","2")
sql insert into tb1 values (now,3,3.0,3.0,3,3,3,true,"3","3")
sql insert into tb1 values (now+100s,4,4.0,4.0,4,4,4,true,"4","4")
sql insert into tb1 values (now+200s,4,4.0,4.0,4,4,4,true,"4","4")
sql insert into tb1 values (now+300s,4,4.0,4.0,4,4,4,true,"4","4")
sql insert into tb1 values (now+400s,4,4.0,4.0,4,4,4,true,"4","4")
sql insert into tb1 values (now+500s,4,4.0,4.0,4,4,4,true,"4","4")
sql select tbname,id1 from st2;
sql insert into tb2 values (now+300s,4,4.0,4.0,4,4,4,true,"4","4")
sql insert into tb3 values (now+400s,4,4.0,4.0,4,4,4,true,"4","4")
sql insert into tb4 values (now+500s,4,4.0,4.0,4,4,4,true,"4","4")
sql select distinct(tbname), id1 from st2;
if $rows != 4 then
return -1
endi
sql select * from st2;
if $rows != 8 then
return -1
endi
sql select * from st2 where ts between now-50s and now+450s
if $rows != 5 then
return -1
endi
sql select tbname,id1 from st2 where id1 between 2 and 3;
sql select tbname, id1 from st2 where id1 between 2 and 3;
if $rows != 2 then
return -1
endi
if $data00 != tb2 then
return -1
endi
if $data01 != 2 then
return -1
endi
if $data10 != tb3 then
return -1
endi
if $data11 != 3 then
return -1
endi
sql select tbname,id2 from st2 where id2 between 2.0 and 3.0;
if $rows != 2 then
sql select tbname, id2 from st2 where id2 between 0.0 and 3.0;
if $rows != 7 then
return -1
endi
if $data00 != tb2 then
if $data(tb2)[0] != tb2 then
return -1
endi
if $data01 != 2.00000 then
if $data(tb2)[1] != 2.00000 then
return -1
endi
if $data10 != tb3 then
if $data(tb3)[0] != tb3 then
return -1
endi
if $data11 != 3.00000 then
if $data(tb3)[1] != 3.00000 then
return -1
endi
sql select tbname,id4 from st2 where id4 between 2.0 and 3.0;
sql select tbname, id4 from st2 where id2 between 2.0 and 3.0;
if $rows != 2 then
return -1
endi
if $data00 != tb2 then
if $data(tb2)[0] != tb2 then
return -1
endi
if $data01 != 2.000000000 then
if $data(tb2)[1] != 2.000000000 then
return -1
endi
if $data10 != tb3 then
if $data(tb3)[0] != tb3 then
return -1
endi
if $data11 != 3.000000000 then
if $data(tb3)[1] != 3.000000000 then
return -1
endi
sql select tbname,id5 from st2 where id5 between 2.0 and 3.0;
sql select tbname, id5 from st2 where id5 between 2.0 and 3.0;
if $rows != 2 then
return -1
endi
if $data00 != tb2 then
if $data(tb2)[0] != tb2 then
return -1
endi
if $data01 != 2 then
if $data(tb2)[1] != 2 then
return -1
endi
if $data10 != tb3 then
if $data(tb3)[0] != tb3 then
return -1
endi
if $data11 != 3 then
if $data(tb3)[1] != 3 then
return -1
endi
sql select tbname,id6 from st2 where id6 between 2.0 and 3.0;
if $rows != 2 then
return -1
endi
if $data00 != tb2 then
if $data(tb2)[0] != tb2 then
return -1
endi
if $data01 != 2 then
if $data(tb2)[1] != 2 then
return -1
endi
if $data10 != tb3 then
if $data(tb3)[0] != tb3 then
return -1
endi
if $data11 != 3 then
if $data(tb3)[1] != 3 then
return -1
endi
sql select * from st2 where f1 between 2 and 3 and f2 between 2.0 and 3.0 and f3 between 2.0 and 3.0 and f4 between 2.0 and 3.0 and f5 between 2.0 and 3.0 and f6 between 2.0 and 3.0;
if $rows != 2 then
return -1
endi
if $data01 != 2 then
return -1
endi
...
...
@@ -158,8 +122,8 @@ if $data11 != 3 then
return -1
endi
sql_error select * from st2 where f7 between 2.0 and 3.0;
sql_error select * from st2 where f8 between 2.0 and 3.0;
sql_error select * from st2 where f9 between 2.0 and 3.0;
sql select * from st2 where f7 between 2.0 and 3.0;
sql select * from st2 where f8 between 2.0 and 3.0;
sql select * from st2 where f9 between 2.0 and 3.0;