@@ -130,7 +130,7 @@ After TDengine server is running,execute `taosBenchmark` (previously named tao
...
@@ -130,7 +130,7 @@ After TDengine server is running,execute `taosBenchmark` (previously named tao
taosBenchmark
taosBenchmark
```
```
This command will create a super table "meters" under database "test". Under "meters", 10000 tables are created with names from "d0" to "d9999". Each table has 10000 rows and each row has four columns (ts, current, voltage, phase). Time stamp is starting from "2017-07-14 10:40:00 000" to "2017-07-14 10:40:09 999". Each table has tags "location" and "groupId". groupId is set 1 to 10 randomly, and location is set to "California.SanFrancisco" or "California.SanDieo".
This command will create a super table "meters" under database "test". Under "meters", 10000 tables are created with names from "d0" to "d9999". Each table has 10000 rows and each row has four columns (ts, current, voltage, phase). Time stamp is starting from "2017-07-14 10:40:00 000" to "2017-07-14 10:40:09 999". Each table has tags "location" and "groupId". groupId is set 1 to 10 randomly, and location is set to "California.SanFrancisco" or "California.SanDiego".
This command will insert 100 million rows into the database quickly. Time to insert depends on the hardware configuration, it only takes a dozen seconds for a regular PC server.
This command will insert 100 million rows into the database quickly. Time to insert depends on the hardware configuration, it only takes a dozen seconds for a regular PC server.
In some use cases, the query capability required by application programs can't be achieved directly by builtin functions. With UDF, the functions developed by users can be utilized by query framework to meet some special requirements. UDF normally takes one column of data as input, but can also support the result of sub query as input.
In some use cases, built-in functions are not adequate for the query capability required by application programs. With UDF, the functions developed by users can be utilized by the query framework to meet business and application requirements. UDF normally takes one column of data as input, but can also support the result of a sub-query as input.
From version 2.2.0.0, UDF programmed in C/C++ language can be supported by TDengine.
From version 2.2.0.0, UDF written in C/C++ are supported by TDengine.
Two kinds of functions can be implemented by UDF: scalar function and aggregate function.
## Define UDF
## Types of UDF
Two kinds of functions can be implemented by UDF: scalar functions and aggregate functions.
Scalar functions return multiple rows and aggregate functions return either 0 or 1 row.
In the case of a scalar function you only have to implement the "normal" function template.
In the case of an aggregate function, in addition to the "normal" function, you also need to implement the "merge" and "finalize" function templates even if the implementation is empty. This will become clear in the sections below.
### Scalar Function
### Scalar Function
Below function template can be used to define your own scalar function.
As mentioned earlier, a scalar UDF only has to implement the "normal" function template. The function template below can be used to define your own scalar function.
`void udfNormalFunc(char* data, short itype, short ibytes, int numOfRows, long long* ts, char* dataOutput, char* interBuf, char* tsOutput, int* numOfOutput, short otype, short obytes, SUdfInit* buf)`
`void udfNormalFunc(char* data, short itype, short ibytes, int numOfRows, long long* ts, char* dataOutput, char* interBuf, char* tsOutput, int* numOfOutput, short otype, short obytes, SUdfInit* buf)`
`udfNormalFunc` is the place holder of function name, a function implemented based on the above template can be used to perform scalar computation on data rows. The parameters are fixed to control the data exchange between UDF and TDengine.
`udfNormalFunc` is the place holder for a function name. A function implemented based on the above template can be used to perform scalar computation on data rows. The parameters are fixed to control the data exchange between UDF and TDengine.
- Definitions of the parameters:
- Definitions of the parameters:
...
@@ -30,20 +37,24 @@ Below function template can be used to define your own scalar function.
...
@@ -30,20 +37,24 @@ Below function template can be used to define your own scalar function.
- numOfRows:the number of rows in the input data
- numOfRows:the number of rows in the input data
- ts: the column of timestamp corresponding to the input data
- ts: the column of timestamp corresponding to the input data
- dataOutput:the buffer for output data, total size is `oBytes * numberOfRows`
- dataOutput:the buffer for output data, total size is `oBytes * numberOfRows`
- interBuf:the buffer for intermediate result, its size is specified by `BUFSIZE` parameter when creating a UDF. It's normally used when the intermediate result is not same as the final result, it's allocated and freed by TDengine.
- interBuf:the buffer for an intermediate result. Its size is specified by the `BUFSIZE` parameter when creating a UDF. It's normally used when the intermediate result is not same as the final result. This buffer is allocated and freed by TDengine.
- tsOutput:the column of timestamps corresponding to the output data; it can be used to output timestamp together with the output data if it's not NULL
- tsOutput:the column of timestamps corresponding to the output data; it can be used to output timestamp together with the output data if it's not NULL
- numOfOutput:the number of rows in output data
- numOfOutput:the number of rows in output data
- buf:for the state exchange between UDF and TDengine
- buf:for the state exchange between UDF and TDengine
[add_one.c](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/add_one.c) is one example of the simplest UDF implementations, i.e. one instance of the above `udfNormalFunc` template. It adds one to each value of a column passed in which can be filtered using `where` clause and outputs the result.
[add_one.c](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/add_one.c) is one example of a very simple UDF implementation, i.e. one instance of the above `udfNormalFunc` template. It adds one to each value of a passed in column, which can be filtered using the `where` clause, and outputs the result.
### Aggregate Function
### Aggregate Function
Below function template can be used to define your own aggregate function.
For aggregate UDF, as mentioned earlier you must implement a "normal" function template (described above) and also implement the "merge" and "finalize" templates.
`udfMergeFunc` is the place holder of function name, the function implemented with the above template is used to aggregate the intermediate result, only can be used in the aggregate query for STable.
The function template below can be used to define your own merge function for an aggregate UDF.
`udfMergeFunc` is the place holder for a function name. The function implemented with the above template is used to aggregate intermediate results and can only be used in the aggregate query for STable.
Definitions of the parameters:
Definitions of the parameters:
...
@@ -53,17 +64,11 @@ Definitions of the parameters:
...
@@ -53,17 +64,11 @@ Definitions of the parameters:
- numOfOutput:number of rows in the output data
- numOfOutput:number of rows in the output data
- buf:for the state exchange between UDF and TDengine
- buf:for the state exchange between UDF and TDengine
[abs_max.c](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/abs_max.c) is an user defined aggregate function to get the maximum from the absolute value of a column.
#### Finalize Function Template
The internal processing is that the data affected by the select statement will be divided into multiple row blocks and `udfNormalFunc`, i.e. `abs_max` in this case, is performed on each row block to generate the intermediate of each sub table, then `udfMergeFunc`, i.e. `abs_max_merge` in this case, is performed on the intermediate result of sub tables to aggregate to generate the final or intermediate result of STable. The intermediate result of STable is finally processed by `udfFinalizeFunc` to generate the final result, which contain either 0 or 1 row.
Other typical scenarios, like covariance, can also be achieved by aggregate UDF.
### Finalize
The function template below can be used to finalize the result of your own UDF, normally used when interBuf is used.
Below function template can be used to finalize the result of your own UDF, normally used when interBuf is used.
`udfFinalizeFunc` is the place holder of function name, definitions of the parameter are as below:
`udfFinalizeFunc` is the place holder of function name, definitions of the parameter are as below:
...
@@ -72,47 +77,64 @@ Below function template can be used to finalize the result of your own UDF, norm
...
@@ -72,47 +77,64 @@ Below function template can be used to finalize the result of your own UDF, norm
- numOfOutput:number of output data, can only be 0 or 1 for aggregate function
- numOfOutput:number of output data, can only be 0 or 1 for aggregate function
- buf:for state exchange between UDF and TDengine
- buf:for state exchange between UDF and TDengine
## UDF Conventions
### Example abs_max.c
[abs_max.c](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/abs_max.c) is an example of a user defined aggregate function to get the maximum from the absolute values of a column.
The internal processing happens as follows. The results of the select statement are divided into multiple row blocks and `udfNormalFunc`, i.e. `abs_max` in this case, is performed on each row block to generate the intermediate results for each sub table. Then `udfMergeFunc`, i.e. `abs_max_merge` in this case, is performed on the intermediate result of sub tables to aggregate and generate the final or intermediate result of STable. The intermediate result of STable is finally processed by `udfFinalizeFunc`, i.e. `abs_max_finalize` in this example, to generate the final result, which contains either 0 or 1 row.
Other typical aggregation functions such as covariance, can also be implemented using aggregate UDF.
The naming of 3 kinds of UDF, i.e. udfNormalFunc, udfMergeFunc, and udfFinalizeFunc is required to have same prefix, i.e. the actual name of udfNormalFunc, which means udfNormalFunc doesn't need a suffix following the function name. While udfMergeFunc should be udfNormalFunc followed by `_merge`, udfFinalizeFunc should be udfNormalFunc followed by `_finalize`. The naming convention is part of UDF framework, TDengine follows this convention to invoke corresponding actual functions.\
## UDF Naming Conventions
According to the kind of UDF to implement, the functions that need to be implemented are different.
The naming convention for the 3 kinds of function templates required by UDF is as follows:
- udfNormalFunc, udfMergeFunc, and udfFinalizeFunc are required to have same prefix, i.e. the actual name of udfNormalFunc. The udfNormalFunc doesn't need a suffix following the function name.
- udfMergeFunc should be udfNormalFunc followed by `_merge`
- udfFinalizeFunc should be udfNormalFunc followed by `_finalize`.
The naming convention is part of TDengine's UDF framework. TDengine follows this convention to invoke the corresponding actual functions.
- Scalar function:udfNormalFunc is required
Depending on whether you are creating a scalar UDF or aggregate UDF, the functions that you need to implement are different.
- Aggregate function:udfNormalFunc, udfMergeFunc (if query on STable) and udfFinalizeFunc are required
To be more accurate, assuming we want to implement a UDF named "foo". If the function is a scalar function, what we really need to implement is `foo`; if the function is aggregate function, we need to implement `foo`, `foo_merge`, and `foo_finalize`. For aggregate UDF, even though one of the three functions is not necessary, there must be an empty implementation.
- Scalar function:udfNormalFunc is required.
- Aggregate function:udfNormalFunc, udfMergeFunc (if query on STable) and udfFinalizeFunc are required.
For clarity, assuming we want to implement a UDF named "foo":
- If the function is a scalar function, we only need to implement the "normal" function template and it should be named simply `foo`.
- If the function is an aggregate function, we need to implement `foo`, `foo_merge`, and `foo_finalize`. Note that for aggregate UDF, even though one of the three functions is not necessary, there must be an empty implementation.
## Compile UDF
## Compile UDF
The source code of UDF in C can't be utilized by TDengine directly. UDF can only be loaded into TDengine after compiling to dynamically linked library.
The source code of UDF in C can't be utilized by TDengine directly. UDF can only be loaded into TDengine after compiling to dynamically linked library (DLL).
For example, the example UDF `add_one.c` mentioned in previous sections need to be compiled into DLL using below command on Linux Shell.
For example, the example UDF `add_one.c` mentioned earlier, can be compiled into DLL using the command below, in a Linux Shell.
```bash
```bash
gcc -g-O0-fPIC-shared add_one.c -o add_one.so
gcc -g-O0-fPIC-shared add_one.c -o add_one.so
```
```
The generated DLL file `dd_one.so` can be used later when creating UDF. It's recommended to use GCC not older than 7.5.
The generated DLL file `add_one.so` can be used later when creating a UDF. It's recommended to use GCC not older than 7.5.
## Create and Use UDF
## Create and Use UDF
When a UDF is created in a TDengine instance, it is available across the databases in that instance.
### Create UDF
### Create UDF
SQL command can be executed on the same hos where the generated UDF DLL resides to load the UDF DLL into TDengine, this operation can't be done through REST interface or web console. Once created, all the clients of the current TDengine can use these UDF functions in their SQL commands. UDF are stored in the management node of TDengine. The UDFs loaded in TDengine would be still available after TDengine is restarted.
SQL command can be executed on the host where the generated UDF DLL resides to load the UDF DLL into TDengine. This operation cannot be done through REST interface or web console. Once created, any client of the current TDengine can use these UDF functions in their SQL commands. UDF are stored in the management node of TDengine. The UDFs loaded in TDengine would be still available after TDengine is restarted.
When creating UDF, it needs to be clarified as either scalar function or aggregate function. If the specified type is wrong, the SQL statements using the function would fail with error. Besides, the input type and output type don't need to be same in UDF, but the input data type and output data type need to be consistent with the UDF definition.
When creating UDF, the type of UDF, i.e. a scalar function or aggregate function must be specified. If the specified type is wrong, the SQL statements using the function would fail with errors. The input type and output type don't need to be the same in UDF, but the input data type and output data type must be consistent with the UDF definition.
-ids(X):the function name to be sued in SQL statement, must be consistent with the function name defined by `udfNormalFunc`
-userDefinedFunctionName:The function name to be used in SQL statement which must be consistent with the function name defined by `udfNormalFunc` and is also the name of the compiled DLL (.so file).
-ids(Y):the absolute path of the DLL file including the implementation of the UDF, the path needs to be quoted by single or double quotes
-path:The absolute path of the DLL file including the name of the shared object file (.so). The path must be quoted with single or double quotes.
-typename(Z):the output data type, the value is the literal string of the type
-outputtype:The output data type, the value is the literal string of the supported TDengine data type.
- B:the size of intermediate buffer, in bytes; it's an optional parameter and the range is [0,512]
- B:the size of intermediate buffer, in bytes; it is an optional parameter and the range is [0,512].
For example, below SQL statement can be used to create a UDF from `add_one.so`.
For example, below SQL statement can be used to create a UDF from `add_one.so`.
...
@@ -123,17 +145,17 @@ CREATE FUNCTION add_one AS "/home/taos/udf_example/add_one.so" OUTPUTTYPE INT;
...
@@ -123,17 +145,17 @@ CREATE FUNCTION add_one AS "/home/taos/udf_example/add_one.so" OUTPUTTYPE INT;
-ids(X):the function name to be sued in SQL statement, must be consistent with the function name defined by `udfNormalFunc`
-userDefinedFunctionName:the function name to be used in SQL statement which must be consistent with the function name defined by `udfNormalFunc` and is also the name of the compiled DLL (.so file).
-ids(Y):the absolute path of the DLL file including the implementation of the UDF, the path needs to be quoted by single or double quotes
-path:the absolute path of the DLL file including the name of the shared object file (.so). The path needs to be quoted by single or double quotes.
-typename(Z):the output data type, the value is the literal string of the type
-OUTPUTTYPE:the output data type, the value is the literal string of the type
- B:the size of intermediate buffer, in bytes; it's an optional parameter and the range is [0,512]
- B:the size of intermediate buffer, in bytes; it's an optional parameter and the range is [0,512]
For details about how to use intermediate result, please refer to example program [demo.c](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/demo.c).
For details about how to use intermediate result, please refer to example program [demo.c](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/demo.c).
For example, below SQL statement can be used to create a UDF rom `demo.so`.
For example, below SQL statement can be used to create a UDF from `demo.so`.
@@ -176,11 +198,11 @@ In current version there are some restrictions for UDF
...
@@ -176,11 +198,11 @@ In current version there are some restrictions for UDF
1. Only Linux is supported when creating and invoking UDF for both client side and server side
1. Only Linux is supported when creating and invoking UDF for both client side and server side
2. UDF can't be mixed with builtin functions
2. UDF can't be mixed with builtin functions
3. Only one UDF can be used in a SQL statement
3. Only one UDF can be used in a SQL statement
4.Single column is supported as input for UDF
4.Only a single column is supported as input for UDF
5. Once created successfully, UDF is persisted in MNode of TDengineUDF
5. Once created successfully, UDF is persisted in MNode of TDengineUDF
6. UDF can't be created through REST interface
6. UDF can't be created through REST interface
7. The function name used when creating UDF in SQL must be consistent with the function name defined in the DLL, i.e. the name defined by `udfNormalFunc`
7. The function name used when creating UDF in SQL must be consistent with the function name defined in the DLL, i.e. the name defined by `udfNormalFunc`
8. The name name of UDF name should not conflict with any of builtin functions
8. The name of a UDF should not conflict with any of TDengine's built-in functions
The previous section[Deployment](/cluster/deploy) introduced how to deploy and start a cluster from scratch. Once a cluster is ready, the dnode status in the cluster can be shown at any time, new dnode can be added to scale out the cluster, an existing dnode can be removed, even load balance can be performed manually.
The previous section, [Deployment],(/cluster/deploy) showed you how to deploy and start a cluster from scratch. Once a cluster is ready, the status of dnode(s) in the cluster can be shown at any time. Dnodes can be managed from the TDengine CLI. New dnode(s) can be added to scale out the cluster, an existing dnode can be removed and you can even perform load balancing manually, if necessary.
:::note
:::note
All the commands to be introduced in this chapter need to be run through TDengine CLI, sometimes it's necessary to use root privilege.
All the commands introduced in this chapter must be run in the TDengine CLI - `taos`. Note that sometimes it is necessary to use root privilege.
:::
:::
## Show DNODEs
## Show DNODEs
The below command can be executed in TDengine CLI `taos` to list all dnodes in the cluster, including ID, end point (fqdn:port), status (ready, offline), number of vnodes, number of free vnodes, etc. It's suggested to execute this command to check after adding or removing a dnode.
The below command can be executed in TDengine CLI `taos` to list all dnodes in the cluster, including ID, end point (fqdn:port), status (ready, offline), number of vnodes, number of free vnodes and so on. We recommend executing this command after adding or removing a dnode.
```sql
```sql
SHOWDNODES;
SHOWDNODES;
...
@@ -30,7 +30,7 @@ Query OK, 1 row(s) in set (0.008298s)
...
@@ -30,7 +30,7 @@ Query OK, 1 row(s) in set (0.008298s)
## Show VGROUPs
## Show VGROUPs
To utilize system resources efficiently and provide scalability, data sharding is required. The data of each database is divided into multiple shards and stored in multiple vnodes. These vnodes may be located in different dnodes, scaling out can be achieved by adding more vnodes from more dnodes. Each vnode can only be used for a single DB, but one DB can have multiple vnodes. The allocation of vnode is scheduled automatically by mnode according to system resources of the dnodes.
To utilize system resources efficiently and provide scalability, data sharding is required. The data of each database is divided into multiple shards and stored in multiple vnodes. These vnodes may be located on different dnodes. One way of scaling out is to add more vnodes on dnodes. Each vnode can only be used for a single DB, but one DB can have multiple vnodes. The allocation of vnode is scheduled automatically by mnode based on system resources of the dnodes.
Launch TDengine CLI `taos` and execute below command:
Launch TDengine CLI `taos` and execute below command:
...
@@ -87,7 +87,7 @@ taos> show dnodes;
...
@@ -87,7 +87,7 @@ taos> show dnodes;
Query OK, 2 row(s) in set (0.001017s)
Query OK, 2 row(s) in set (0.001017s)
```
```
It can be seen that the status of the new dnode is "offline", once the dnode is started and connects the firstEp of the cluster, execute the command again and get the example output below, from which it can be seen that two dnodes are both in "ready" status.
It can be seen that the status of the new dnode is "offline". Once the dnode is started and connects to the firstEp of the cluster, you can execute the command again and get the example output below. As can be seen, both dnodes are in "ready" status.
```
```
taos> show dnodes;
taos> show dnodes;
...
@@ -132,12 +132,12 @@ taos> show dnodes;
...
@@ -132,12 +132,12 @@ taos> show dnodes;
Query OK, 1 row(s) in set (0.001137s)
Query OK, 1 row(s) in set (0.001137s)
```
```
In the above example, when `show dnodes` is executed the first time, two dnodes are shown. Then `drop dnode 2` is executed, after that from the output of executing `show dnodes` again it can be seen that only the dnode with ID 1 is still in the cluster.
In the above example, when `show dnodes` is executed the first time, two dnodes are shown. After `drop dnode 2` is executed, you can execute `show dnodes` again and it can be seen that only the dnode with ID 1 is still in the cluster.
:::note
:::note
- Once a dnode is dropped, it can't rejoin the cluster. To rejoin, the dnode needs to deployed again after cleaning up the data directory. Normally, before dropping a dnode, the data belonging to the dnode needs to be migrated to other place.
- Once a dnode is dropped, it can't rejoin the cluster. To rejoin, the dnode needs to deployed again after cleaning up the data directory. Before dropping a dnode, the data belonging to the dnode MUST be migrated/backed up according to your data retention, data security or other SOPs.
- Please be noted that `drop dnode` is different from stopping `taosd` process. `drop dnode` just removes the dnode out of TDengine cluster. Only after a dnode is dropped, can the corresponding `taosd` process be stopped.
- Please note that `drop dnode` is different from stopping `taosd` process. `drop dnode` just removes the dnode out of TDengine cluster. Only after a dnode is dropped, can the corresponding `taosd` process be stopped.
- Once a dnode is dropped, other dnodes in the cluster will be notified of the drop and will not accept the request from the dropped dnode.
- Once a dnode is dropped, other dnodes in the cluster will be notified of the drop and will not accept the request from the dropped dnode.
- dnodeID is allocated automatically and can't be manually modified. dnodeID is generated in ascending order without duplication.
- dnodeID is allocated automatically and can't be manually modified. dnodeID is generated in ascending order without duplication.
@@ -7,7 +7,7 @@ title: High Availability and Load Balancing
...
@@ -7,7 +7,7 @@ title: High Availability and Load Balancing
High availability of vnode and mnode can be achieved through replicas in TDengine.
High availability of vnode and mnode can be achieved through replicas in TDengine.
The number of vnodes is associated with each DB, there can be multiple DBs in a TDengine cluster. A different number of replicas can be configured for each DB. When creating a database, the parameter `replica` is used to specify the number of replicas, the default value is 1. With single replica, the high availability of the system can't be guaranteed. Whenever one node is down, the data service will be unavailable. The number of dnodes in the cluster must NOT be lower than the number of replicas set for any DB, otherwise the `create table` operation would fail with error "more dnodes are needed". The SQL statement below is used to create a database named "demo" with 3 replicas.
A TDengine cluster can have multiple databases. Each database has a number of vnodes associated with it. A different number of replicas can be configured for each DB. When creating a database, the parameter `replica` is used to specify the number of replicas. The default value for `replica` is 1. Naturally, a single replica cannot guarantee high availability since if one node is down, the data service is unavailable. Note that the number of dnodes in the cluster must NOT be lower than the number of replicas set for any DB, otherwise the `create table` operation will fail with error "more dnodes are needed". The SQL statement below is used to create a database named "demo" with 3 replicas.
The data in a DB is divided into multiple shards and stored in multiple vgroups. The number of vnodes in each vgroup is determined by the number of replicas set for the DB. The vnodes in each vgroup store exactly the same data. For the purpose of high availability, the vnodes in a vgroup must be located in different dnodes on different hosts. As long as over half of the vnodes in a vgroup are in an online state, the vgroup is able to provide data access. Otherwise the vgroup can't provide data access for reading or inserting data.
The data in a DB is divided into multiple shards and stored in multiple vgroups. The number of vnodes in each vgroup is determined by the number of replicas set for the DB. The vnodes in each vgroup store exactly the same data. For the purpose of high availability, the vnodes in a vgroup must be located in different dnodes on different hosts. As long as over half of the vnodes in a vgroup are in an online state, the vgroup is able to provide data access. Otherwise the vgroup can't provide data access for reading or inserting data.
There may be data for multiple DBs in a dnode. Once a dnode is down, multiple DBs may be affected. However, it's hard to say the cluster is guaranteed to work properly as long as over half of dnodes are online because vnodes are introduced and there may be complex mapping between vnodes and dnodes.
There may be data for multiple DBs in a dnode. When a dnode is down, multiple DBs may be affected. While in theory, the cluster will provide data access for reading or inserting data if over half the vnodes in vgroups are online, because of the possibly complex mapping between vnodes and dnodes, it is difficult to guarantee that the cluster will work properly if over half of the dnodes are online.
## High Availability of Mnode
## High Availability of Mnode
Each TDengine cluster is managed by `mnode`, which is a module of `taosd`. For the high availability of mnode, multiple mnodes can be configured using system parameter `numOfMNodes`, the valid time range is [1,3]. To make sure the data consistency between mnodes, the data replication between mnodes is performed in a synchronous way.
Each TDengine cluster is managed by `mnode`, which is a module of `taosd`. For the high availability of mnode, multiple mnodes can be configured using system parameter `numOfMNodes`. The valid range for `numOfMnodes` is [1,3]. To ensure data consistency between mnodes, data replication between mnodes is performed synchronously.
There may be multiple dnodes in a cluster, but only one mnode can be started in each dnode. Which one or ones of the dnodes will be designated as mnodes is automatically determined by TDengine according to the cluster configuration and system resources. Command `show mnodes` can be executed in TDengine `taos` to show the mnodes in the cluster.
There may be multiple dnodes in a cluster, but only one mnode can be started in each dnode. Which one or ones of the dnodes will be designated as mnodes is automatically determined by TDengine according to the cluster configuration and system resources. The command `show mnodes` can be executed in TDengine `taos` to show the mnodes in the cluster.
```sql
```sql
SHOWMNODES;
SHOWMNODES;
```
```
The end point and role/status (master, slave, unsynced, or offline) of all mnodes can be shown by the above command. When the first dnode is started in a cluster, there must be one mnode in this dnode, because there must be at least one mnode otherwise the cluster doesn't work. If `numOfMNodes` is configured to 2, another mnode will be started when the second dnode is launched.
The end point and role/status (master, slave, unsynced, or offline) of all mnodes can be shown by the above command. When the first dnode is started in a cluster, there must be one mnode in this dnode. Without at least one mnode, the cluster cannot work. If `numOfMNodes` is configured to 2, another mnode will be started when the second dnode is launched.
For the high availability of mnode, `numOfMnodes` needs to be configured to 2 or a higher value. Because the data consistency between mnodes must be guaranteed, the replica confirmation parameter `quorum` is set to 2 automatically if `numOfMNodes` is set to 2 or higher.
For the high availability of mnode, `numOfMnodes` needs to be configured to 2 or a higher value. Because the data consistency between mnodes must be guaranteed, the replica confirmation parameter `quorum` is set to 2 automatically if `numOfMNodes` is set to 2 or higher.
...
@@ -36,15 +36,16 @@ If high availability is important for your system, both vnode and mnode must be
...
@@ -36,15 +36,16 @@ If high availability is important for your system, both vnode and mnode must be
:::
:::
## Load Balance
## Load Balancing
Load balance will be triggered in 3 cases without manual intervention.
Load balancing will be triggered in 3 cases without manual intervention.
- When a new dnode is joined in the cluster, automatic load balancing may be triggered, some data from some dnodes may be transferred to the new dnode automatically.
- When a new dnode joins the cluster, automatic load balancing may be triggered. Some data from other dnodes may be transferred to the new dnode automatically.
- When a dnode is removed from the cluster, the data from this dnode will be transferred to other dnodes automatically.
- When a dnode is removed from the cluster, the data from this dnode will be transferred to other dnodes automatically.
- When a dnode is too hot, i.e. too much data has been stored in it, automatic load balancing may be triggered to migrate some vnodes from this dnode to other dnodes.
- When a dnode is too hot, i.e. too much data has been stored in it, automatic load balancing may be triggered to migrate some vnodes from this dnode to other dnodes.
:::tip
:::tip
Automatic load balancing is controlled by parameter `balance`, 0 means disabled and 1 means enabled.
Automatic load balancing is controlled by the parameter `balance`, 0 means disabled and 1 means enabled. This is set in the file [taos.cfg](https://docs.tdengine.com/reference/config/#balance).
:::
:::
...
@@ -52,22 +53,22 @@ Automatic load balancing is controlled by parameter `balance`, 0 means disabled
...
@@ -52,22 +53,22 @@ Automatic load balancing is controlled by parameter `balance`, 0 means disabled
When a dnode is offline, it can be detected by the TDengine cluster. There are two cases:
When a dnode is offline, it can be detected by the TDengine cluster. There are two cases:
- The dnode becomes online again before the threshold configured in `offlineThreshold` is reached, it is still in the cluster and data replication is started automatically. The dnode can work properly after the data syncup is finished.
- The dnode comes online before the threshold configured in `offlineThreshold` is reached. The dnode is still in the cluster and data replication is started automatically. The dnode can work properly after the data sync is finished.
- If the dnode has been offline over the threshold configured in `offlineThreshold` in `taos.cfg`, the dnode will be removed from the cluster automatically. A system alert will be generated and automatic load balancing will be triggered if `balance` is set to 1. When the removed dnode is restarted and becomes online, it will not join in the cluster automatically, it can only be joined manually by the system operator.
- If the dnode has been offline over the threshold configured in `offlineThreshold` in `taos.cfg`, the dnode will be removed from the cluster automatically. A system alert will be generated and automatic load balancing will be triggered if `balance` is set to 1. When the removed dnode is restarted and becomes online, it will not join the cluster automatically. The system administrator has to manually join the dnode to the cluster.
:::note
:::note
If all the vnodes in a vgroup (or mnodes in mnode group) are in offline or unsynced status, the master node can only be voted after all the vnodes or mnodes in the group become online and can exchange status, then the vgroup (or mnode group) is able to provide service.
If all the vnodes in a vgroup (or mnodes in mnode group) are in offline or unsynced status, the master node can only be voted on, after all the vnodes or mnodes in the group become online and can exchange status. Following this, the vgroup (or mnode group) is able to provide service.
:::
:::
## Arbitrator
## Arbitrator
If the number of replicas is set to an even number like 2, when half of the vnodes in a vgroup don't work a master node can't be voted. A similar case is also applicable to mnode if the number of mnodes is set to an even number like 2.
The "arbitrator" component is used to address the special case when the number of replicas is set to an even number like 2,4 etc. If half of the vnodes in a vgroup don't work, it is impossible to vote and select a master node. This situation also applies to mnodes if the number of mnodes is set to an even number like 2,4 etc.
To resolve this problem, a new arbitrator component named `tarbitrator`, abbreviated for TDengine Arbitrator, was introduced. Arbitrator simulates a vnode or mnode but it's only responsible for network communication and doesn't handle any actual data access. As long as more than half of the vnode or mnode, including Arbitrator, are available the vnode group or mnode group can provide data insertion or query services normally.
To resolve this problem, a new arbitrator component named `tarbitrator`, an abbreviation of TDengine Arbitrator, was introduced. The `tarbitrator` simulates a vnode or mnode but it's only responsible for network communication and doesn't handle any actual data access. As long as more than half of the vnode or mnode, including Arbitrator, are available the vnode group or mnode group can provide data insertion or query services normally.
Normally, it's suggested to configure a replica number of each DB or system parameter `numOfMNodes` to an odd number. However, if a user is very sensitive to storage space, a replica number of 2 plus arbitrator component can be used to achieve both lower cost of storage space and high availability.
Normally, it's prudent to configure the replica number for each DB or system parameter `numOfMNodes` to be an odd number. However, if a user is very sensitive to storage space, a replica number of 2 plus arbitrator component can be used to achieve both lower cost of storage space and high availability.
Arbitrator component is installed with the server package. For details about how to install, please refer to [Install](/operation/pkg-install). The `-p` parameter of `tarbitrator` can be used to specify the port on which it provides service.
Arbitrator component is installed with the server package. For details about how to install, please refer to [Install](/operation/pkg-install). The `-p` parameter of `tarbitrator` can be used to specify the port on which it provides service.
When using TDengine to store and query data, the most important part of the data is timestamp. Timestamp must be specified when creating and inserting data rows or querying data, timestamp must follow the rules below:
When using TDengine to store and query data, the most important part of the data is timestamp. Timestamp must be specified when creating and inserting data rows. Timestamp must follow the rules below:
-the format must be `YYYY-MM-DD HH:mm:ss.MS`, the default time precision is millisecond (ms), for example `2017-08-12 18:25:58.128`
-The format must be `YYYY-MM-DD HH:mm:ss.MS`, the default time precision is millisecond (ms), for example `2017-08-12 18:25:58.128`
-internal function `now` can be used to get the current timestamp of the client side
-Internal function `now` can be used to get the current timestamp on the client side
-the current timestamp of the client side is applied when `now` is used to insert data
-The current timestamp of the client side is applied when `now` is used to insert data
- Epoch Time:timestamp can also be a long integer number, which means the number of seconds, milliseconds or nanoseconds, depending on the time precision, from 1970-01-01 00:00:00.000 (UTC/GMT)
- Epoch Time:timestamp can also be a long integer number, which means the number of seconds, milliseconds or nanoseconds, depending on the time precision, from 1970-01-01 00:00:00.000 (UTC/GMT)
-timestamp can be applied with add/subtract operation, for example `now-2h` means 2 hours back from the time at which query is executed,the unit can be b(nanosecond), u(microsecond), a(millisecond), s(second), m(minute), h(hour), d(day), or w(week). So `select * from t1 where ts > now-2w and ts <= now-1w` means the data between two weeks ago and one week ago. The time unit can also be n (calendar month) or y (calendar year) when specifying the time window for down sampling operation.
-Add/subtract operations can be carried out on timestamps. For example `now-2h` means 2 hours prior to the time at which query is executed. The units of time in operations can be b(nanosecond), u(microsecond), a(millisecond), s(second), m(minute), h(hour), d(day), or w(week). So `select * from t1 where ts > now-2w and ts <= now-1w` means the data between two weeks ago and one week ago. The time unit can also be n (calendar month) or y (calendar year) when specifying the time window for down sampling operations.
Time precision in TDengine can be set by the `PRECISION` parameter when executing `CREATE DATABASE`, like below, the default time precision is millisecond.
Time precision in TDengine can be set by the `PRECISION` parameter when executing `CREATE DATABASE`. The default time precision is millisecond. In the statement below, the precision is set to nanonseconds.
```sql
```sql
CREATEDATABASEdb_namePRECISION'ns';
CREATEDATABASEdb_namePRECISION'ns';
...
@@ -30,8 +30,8 @@ In TDengine, the data types below can be used when specifying a column or tag.
...
@@ -30,8 +30,8 @@ In TDengine, the data types below can be used when specifying a column or tag.
| 7 | SMALLINT | 2 | Short integer, the value range is [-32767, 32767], while -32768 is treated as NULL |
| 7 | SMALLINT | 2 | Short integer, the value range is [-32767, 32767], while -32768 is treated as NULL |
| 8 | TINYINT | 1 | Single-byte integer, the value range is [-127, 127], while -128 is treated as NULL |
| 8 | TINYINT | 1 | Single-byte integer, the value range is [-127, 127], while -128 is treated as NULL |
| 9 | BOOL | 1 | Bool, the value range is {true, false} |
| 9 | BOOL | 1 | Bool, the value range is {true, false} |
| 10 | NCHAR | User Defined| Multiple-Byte string that can include like Chinese characters. Each character of NCHAR type consumes 4 bytes storage. The string value should be quoted with single quotes. Literal single quote inside the string must be preceded with backslash, like `\’`. The length must be specified when defining a column or tag of NCHAR type, for example nchar(10) means it can store at most 10 characters of nchar type and will consume fixed storage of 40 bytes. An error will be reported if the string value exceeds the length defined. |
| 10 | NCHAR | User Defined| Multi-Byte string that can include multi byte characters like Chinese characters. Each character of NCHAR type consumes 4 bytes storage. The string value should be quoted with single quotes. Literal single quote inside the string must be preceded with backslash, like `\’`. The length must be specified when defining a column or tag of NCHAR type, for example nchar(10) means it can store at most 10 characters of nchar type and will consume fixed storage of 40 bytes. An error will be reported if the string value exceeds the length defined. |
| 11 | JSON | | json type can only be used on tag, a tag of json type is excluded with any other tags of any other type |
| 11 | JSON | | JSON type can only be used on tags. A tag of json type is excluded with any other tags of any other type |
:::tip
:::tip
TDengine is case insensitive and treats any characters in the sql command as lower case by default, case sensitive strings must be quoted with single quotes.
TDengine is case insensitive and treats any characters in the sql command as lower case by default, case sensitive strings must be quoted with single quotes.
...
@@ -39,7 +39,7 @@ TDengine is case insensitive and treats any characters in the sql command as low
...
@@ -39,7 +39,7 @@ TDengine is case insensitive and treats any characters in the sql command as low
:::
:::
:::note
:::note
Only ASCII visible characters are suggested to be used in a column or tag of BINARY type. Multiple-byte characters must be stored in NCHAR type.
Only ASCII visible characters are suggested to be used in a column or tag of BINARY type. Multi-byte characters must be stored in NCHAR type.
1. KEEP specifies the number of days for which the data in the database to be created will be kept, the default value is 3650 days, i.e. 10 years. The data will be deleted automatically once its age exceeds this threshold.
1. KEEP specifies the number of days for which the data in the database will be retained. The default value is 3650 days, i.e. 10 years. The data will be deleted automatically once its age exceeds this threshold.
2. UPDATE specifies whether the data can be updated and how the data can be updated.
2. UPDATE specifies whether the data can be updated and how the data can be updated.
1. UPDATE set to 0 means update operation is not allowed, the data with an existing timestamp will be dropped silently.
1. UPDATE set to 0 means update operation is not allowed. The update for data with an existing timestamp will be discarded silently and the original record in the database will be preserved as is.
2. UPDATE set to 1 means the whole row will be updated, the columns for which no value is specified will be set to NULL
2. UPDATE set to 1 means the whole row will be updated. The columns for which no value is specified will be set to NULL.
3. UPDATE set to 2 means updating a part of columns for a row is allowed, the columns for which no value is specified will be kept as no change
3. UPDATE set to 2 means updating a subset of columns for a row is allowed. The columns for which no value is specified will be kept unchanged.
3. The maximum length of database name is 33 bytes.
3. The maximum length of database name is 33 bytes.
4. The maximum length of a SQL statement is 65,480 bytes.
4. The maximum length of a SQL statement is 65,480 bytes.
5. Below are the parameters that can be used when creating a database
5. Below are the parameters that can be used when creating a database
6. Please note that all of the parameters mentioned in this section can be configured in configuration file `taosd.cfg` at server side and used by default, the default parameters can be overriden if they are specified in`create database` statement.
6. Please note that all of the parameters mentioned in this section are configured in configuration file `taos.cfg` on the TDengine server. If not specified in the `create database` statement, the values from taos.cfg are used by default. To override default parameters, they must be specified in the`create database` statement.
:::
:::
...
@@ -52,7 +52,7 @@ USE db_name;
...
@@ -52,7 +52,7 @@ USE db_name;
```
```
:::note
:::note
This way is not applicable when using a REST connection
This way is not applicable when using a REST connection. In a REST connection the database name must be specified before a table or stable name. For e.g. to query the stable "meters" in database "test" the query would be "SELECT count(*) from test.meters"
:::
:::
...
@@ -63,13 +63,13 @@ DROP DATABASE [IF EXISTS] db_name;
...
@@ -63,13 +63,13 @@ DROP DATABASE [IF EXISTS] db_name;
```
```
:::note
:::note
All data in the database will be deleted too. This command must be used with caution.
All data in the database will be deleted too. This command must be used with extreme caution. Please follow your organization's data integrity, data backup, data security or any other applicable SOPs before using this command.
:::
:::
## Change Database Configuration
## Change Database Configuration
Some examples are shown below to demonstrate how to change the configuration of a database. Please note that some configuration parameters can be changed after the database is created, but some others can't, for details of the configuration parameters of database please refer to [Configuration Parameters](/reference/config/).
Some examples are shown below to demonstrate how to change the configuration of a database. Please note that some configuration parameters can be changed after the database is created, but some cannot. For details of the configuration parameters of database please refer to [Configuration Parameters](/reference/config/).
```
```
ALTER DATABASE db_name COMP 2;
ALTER DATABASE db_name COMP 2;
...
@@ -81,7 +81,7 @@ COMP parameter specifies whether the data is compressed and how the data is comp
...
@@ -81,7 +81,7 @@ COMP parameter specifies whether the data is compressed and how the data is comp
ALTER DATABASE db_name REPLICA 2;
ALTER DATABASE db_name REPLICA 2;
```
```
REPLICA parameter specifies the number of replications of the database.
REPLICA parameter specifies the number of replicas of the database.
```
```
ALTER DATABASE db_name KEEP 365;
ALTER DATABASE db_name KEEP 365;
...
@@ -124,4 +124,4 @@ SHOW DATABASES;
...
@@ -124,4 +124,4 @@ SHOW DATABASES;
SHOW CREATE DATABASE db_name;
SHOW CREATE DATABASE db_name;
```
```
This command is useful when migrating the data from one TDengine cluster to another one. This command can be used to get the CREATE statement, which can be used in another TDengine to create the exact same database.
This command is useful when migrating the data from one TDengine cluster to another. This command can be used to get the CREATE statement, which can be used in another TDengine instance to create the exact same database.
1. The first column of a table must be of TIMESTAMP type, and it will be set as the primary key automatically
1. The first column of a table MUST be of type TIMESTAMP. It is automatically set as the primary key.
2. The maximum length of the table name is 192 bytes.
2. The maximum length of the table name is 192 bytes.
3. The maximum length of each row is 16k bytes, please note that the extra 2 bytes used by each BINARY/NCHAR column are also counted.
3. The maximum length of each row is 16k bytes, please note that the extra 2 bytes used by each BINARY/NCHAR column are also counted.
4. The name of the subtable can only consist of English characters, digits and underscore, and can't start with a digit. Table names are case insensitive.
4. The name of the subtable can only consist of characters from the English alphabet, digits and underscore. Table names can't start with a digit. Table names are case insensitive.
5. The maximum length in bytes must be specified when using BINARY or NCHAR types.
5. The maximum length in bytes must be specified when using BINARY or NCHAR types.
6. Escape character "\`" can be used to avoid the conflict between table names and reserved keywords, above rules will be bypassed when using escape character on table names, but the upper limit for the name length is still valid. The table names specified using escape character are case sensitive. Only ASCII visible characters can be used with escape character.
6. Escape character "\`" can be used to avoid the conflict between table names and reserved keywords, above rules will be bypassed when using escape character on table names, but the upper limit for the name length is still valid. The table names specified using escape character are case sensitive. Only ASCII visible characters can be used with escape character.
For example \`aBc\` and \`abc\` are different table names but `abc` and `aBc` are same table names because they are both converted to `abc` internally.
For example \`aBc\` and \`abc\` are different table names but `abc` and `aBc` are same table names because they are both converted to `abc` internally.
...
@@ -44,7 +44,7 @@ The tags for which no value is specified will be set to NULL.
...
@@ -44,7 +44,7 @@ The tags for which no value is specified will be set to NULL.
CREATE TABLE [IF NOT EXISTS] tb_name1 USING stb_name TAGS (tag_value1, ...) [IF NOT EXISTS] tb_name2 USING stb_name TAGS (tag_value2, ...) ...;
CREATE TABLE [IF NOT EXISTS] tb_name1 USING stb_name TAGS (tag_value1, ...) [IF NOT EXISTS] tb_name2 USING stb_name TAGS (tag_value2, ...) ...;
```
```
This can be used to create a lot of tables in a single SQL statement to accelerate the speed of the creating tables.
This can be used to create a lot of tables in a single SQL statement while making table creation much faster.
:::info
:::info
...
@@ -111,7 +111,7 @@ If a table is created using a super table as template, the table definition can
...
@@ -111,7 +111,7 @@ If a table is created using a super table as template, the table definition can
ALTER TABLE tb_name MODIFY COLUMN field_name data_type(length);
ALTER TABLE tb_name MODIFY COLUMN field_name data_type(length);
```
```
The type of a column is variable length, like BINARY or NCHAR, this can be used to change (or increase) the length of the column.
If the type of a column is variable length, like BINARY or NCHAR, this command can be used to change the length of the column.
:::note
:::note
If a table is created using a super table as template, the table definition can only be changed on the corresponding super table, and the change will be automatically applied to all the subtables created using this super table as template. For tables created in the normal way, the table definition can be changed directly on the table.
If a table is created using a super table as template, the table definition can only be changed on the corresponding super table, and the change will be automatically applied to all the subtables created using this super table as template. For tables created in the normal way, the table definition can be changed directly on the table.
This section explains the syntax to operating databases, tables, STables, inserting data, selecting data, functions and some tips that can be used in TDengine SQL. It would be easier to understand with some fundamental knowledge of SQL.
This section explains the syntax of SQL to perform operations on databases, tables and STables, insert data, select data and use functions. We also provide some tips that can be used in TDengine SQL. If you have previous experience with SQL this section will be fairly easy to understand. If you do not have previous experience with SQL, you'll come to appreciate the simplicity and power of SQL.
TDengine SQL is the major interface for users to write data into or query from TDengine. For users to easily use, syntax similar to standard SQL is provided. However, please note that TDengine SQL is not standard SQL. For instance, TDengine doesn't provide the functionality of deleting time series data, thus corresponding statements are not provided in TDengine SQL.
TDengine SQL is the major interface for users to write data into or query from TDengine. For ease of use, the syntax is similar to that of standard SQL. However, please note that TDengine SQL is not standard SQL. For instance, TDengine doesn't provide a delete function for time series data and so corresponding statements are not provided in TDengine SQL.
TDengine SQL doesn't support abbreviation for keywords, for example `DESCRIBE` can't be abbreviated as `DESC`.
TDengine SQL doesn't support abbreviation for keywords. For example "SELECT" cannot be abbreviated as "SEL".
Syntax Specifications used in this chapter:
Syntax Specifications used in this chapter:
...
@@ -16,7 +16,7 @@ Syntax Specifications used in this chapter:
...
@@ -16,7 +16,7 @@ Syntax Specifications used in this chapter:
- | means one of a few options, excluding | itself.
- | means one of a few options, excluding | itself.
- … means the item prior to it can be repeated multiple times.
- … means the item prior to it can be repeated multiple times.
To better demonstrate the syntax, usage and rules of TAOS SQL, hereinafter it's assumed that there is a data set of meters. Assuming each meter collects 3 data measurements: current, voltage, phase. The data model is shown below:
To better demonstrate the syntax, usage and rules of TAOS SQL, hereinafter it's assumed that there is a data set of data from electric meters. Each meter collects 3 data measurements: current, voltage, phase. The data model is shown below:
```sql
```sql
taos>DESCRIBEmeters;
taos>DESCRIBEmeters;
...
@@ -30,4 +30,4 @@ taos> DESCRIBE meters;
...
@@ -30,4 +30,4 @@ taos> DESCRIBE meters;
groupid|INT|4|TAG|
groupid|INT|4|TAG|
```
```
The data set includes the data collected by 4 meters, the corresponding table name is d1001, d1002, d1003, d1004 respectively based on the data model of TDengine.
The data set includes the data collected by 4 meters, the corresponding table name is d1001, d1002, d1003 and d1004 based on the data model of TDengine.
@@ -10,7 +10,7 @@ One difference from the native connector is that the REST interface is stateless
...
@@ -10,7 +10,7 @@ One difference from the native connector is that the REST interface is stateless
## Installation
## Installation
The REST interface does not rely on any TDengine native library, so the client application does not need to install any TDengine libraries. The client application's development language supports the HTTP protocol is enough.
The REST interface does not rely on any TDengine native library, so the client application does not need to install any TDengine libraries. The client application's development language only needs to support the HTTP protocol.
@@ -53,7 +53,7 @@ Earlier TDengine client software includes the Python connector. If the Python co
...
@@ -53,7 +53,7 @@ Earlier TDengine client software includes the Python connector. If the Python co
:::
:::
#### to install `taospy`
#### To install `taospy`
<Tabs>
<Tabs>
<TabItem label="Install from PyPI" value="pypi">
<TabItem label="Install from PyPI" value="pypi">
...
@@ -320,7 +320,7 @@ All database operations will be thrown directly if an exception occurs. The appl
...
@@ -320,7 +320,7 @@ All database operations will be thrown directly if an exception occurs. The appl
### About nanoseconds
### About nanoseconds
Due to the current imperfection of Python's nanosecond support (see link below), the current implementation returns integers at nanosecond precision instead of the `datetime` type produced by `ms and `us`, which application developers will need to handle on their own. And it is recommended to use pandas' to_datetime(). The Python Connector may modify the interface in the future if Python officially supports nanoseconds in full.
Due to the current imperfection of Python's nanosecond support (see link below), the current implementation returns integers at nanosecond precision instead of the `datetime` type produced by `ms` and `us`, which application developers will need to handle on their own. And it is recommended to use pandas' to_datetime(). The Python Connector may modify the interface in the future if Python officially supports nanoseconds in full.
@@ -30,9 +30,9 @@ taosAdapter provides the following features.
...
@@ -30,9 +30,9 @@ taosAdapter provides the following features.
### Install taosAdapter
### Install taosAdapter
taosAdapter has been part of TDengine server software since TDengine v2.4.0.0. If you use the TDengine server, you don't need additional steps to install taosAdapter. You can download taosAdapter from [TAOSData official website](https://taosdata.com/en/all-downloads/) to download the TDengine server installation package (taosAdapter is included in v2.4.0.0 and later version). If you need to deploy taosAdapter separately on another server other than the TDengine server, you should install the full TDengine on that server to install taosAdapter. If you need to build taosAdapter from source code, you can refer to the [Building taosAdapter](https://github.com/taosdata/taosadapter/blob/develop/BUILD.md) documentation.
taosAdapter has been part of TDengine server software since TDengine v2.4.0.0. If you use the TDengine server, you don't need additional steps to install taosAdapter. You can download taosAdapter from [TDengine official website](https://tdengine.com/all-downloads/) to download the TDengine server installation package (taosAdapter is included in v2.4.0.0 and later version). If you need to deploy taosAdapter separately on another server other than the TDengine server, you should install the full TDengine on that server to install taosAdapter. If you need to build taosAdapter from source code, you can refer to the [Building taosAdapter](https://github.com/taosdata/taosadapter/blob/develop/BUILD.md) documentation.
### start/stop taosAdapter
### Start/Stop taosAdapter
On Linux systems, the taosAdapter service is managed by `systemd` by default. You can use the command `systemctl start taosadapter` to start the taosAdapter service and use the command `systemctl stop taosadapter` to stop the taosAdapter service.
On Linux systems, the taosAdapter service is managed by `systemd` by default. You can use the command `systemctl start taosadapter` to start the taosAdapter service and use the command `systemctl stop taosadapter` to stop the taosAdapter service.
...
@@ -153,8 +153,7 @@ See [example/config/taosadapter.toml](https://github.com/taosdata/taosadapter/bl
...
@@ -153,8 +153,7 @@ See [example/config/taosadapter.toml](https://github.com/taosdata/taosadapter/bl
## Feature List
## Feature List
- Compatible with RESTful interfaces
- Compatible with RESTful interfaces [REST API](/reference/rest-api/)
- Compatible with OpenTSDB JSON and telnet format writes
- Compatible with OpenTSDB JSON and telnet format writes
...
@@ -187,7 +186,7 @@ You can use any client that supports the http protocol to write data to or query
...
@@ -187,7 +186,7 @@ You can use any client that supports the http protocol to write data to or query
### InfluxDB
### InfluxDB
You can use any client that supports the http protocol to access the Restful interface address `http://<fqdn>:6041/<APIEndPoint>` to write data in InfluxDB compatible format to TDengine. The EndPoint is as follows:
You can use any client that supports the http protocol to access the RESTful interface address `http://<fqdn>:6041/<APIEndPoint>` to write data in InfluxDB compatible format to TDengine. The EndPoint is as follows:
```text
```text
/influxdb/v1/write
/influxdb/v1/write
...
@@ -204,7 +203,7 @@ Note: InfluxDB token authorization is not supported at present. Only Basic autho
...
@@ -204,7 +203,7 @@ Note: InfluxDB token authorization is not supported at present. Only Basic autho
### OpenTSDB
### OpenTSDB
You can use any client that supports the http protocol to access the Restful interface address `http://<fqdn>:6041/<APIEndPoint>` to write data in OpenTSDB compatible format to TDengine.
You can use any client that supports the http protocol to access the RESTful interface address `http://<fqdn>:6041/<APIEndPoint>` to write data in OpenTSDB compatible format to TDengine.
@@ -12,14 +12,13 @@ taosdump can back up a database, a super table, or a normal table as a logical d
...
@@ -12,14 +12,13 @@ taosdump can back up a database, a super table, or a normal table as a logical d
Suppose the specified location already has data files. In that case, taosdump will prompt the user and exit immediately to avoid data overwriting which means that the same path can only be used for one backup.
Suppose the specified location already has data files. In that case, taosdump will prompt the user and exit immediately to avoid data overwriting which means that the same path can only be used for one backup.
Please be careful if you see a prompt for this.
Please be careful if you see a prompt for this.
taosdump is a logical backup tool and should not be used to back up any raw data, environment settings,
Users should not use taosdump to back up raw data, environment settings, hardware information, server configuration, or cluster topology. taosdump uses [Apache AVRO](https://avro.apache.org/) as the data file format to store backup data.
Users should not use taosdump to back up raw data, environment settings, hardware information, server configuration, or cluster topology. taosdump uses [Apache AVRO](https://avro.apache.org/) as the data file format to store backup data.
## Installation
## Installation
There are two ways to install taosdump:
There are two ways to install taosdump:
- Install the taosTools official installer. Please find taosTools from [All download links](https://www.taosdata.com/all-downloads) page and download and install it.
- Install the taosTools official installer. Please find taosTools from [All download links](https://www.tdengine.com/all-downloads) page and download and install it.
- Compile taos-tools separately and install it. Please refer to the [taos-tools](https://github.com/taosdata/taos-tools) repository for details.
- Compile taos-tools separately and install it. Please refer to the [taos-tools](https://github.com/taosdata/taos-tools) repository for details.
...
@@ -28,14 +27,14 @@ There are two ways to install taosdump:
...
@@ -28,14 +27,14 @@ There are two ways to install taosdump:
### taosdump backup data
### taosdump backup data
1. backing up all databases: specify `-A` or `-all-databases` parameter.
1. backing up all databases: specify `-A` or `-all-databases` parameter.
2. backup multiple specified databases: use `-D db1,db2,... ` parameters;
3. back up some super or normal tables in the specified database: use `-dbname stbname1 stbname2 tbname1 tbname2 ... ` parameters. Note that the first parameter of this input sequence is the database name, and only one database is supported. The second and subsequent parameters are the names of super or normal tables in that database, separated by spaces.
3. back up some super or normal tables in the specified database: use `-dbname stbname1 stbname2 tbname1 tbname2 ... ` parameters. Note that the first parameter of this input sequence is the database name, and only one database is supported. The second and subsequent parameters are the names of super or normal tables in that database, separated by spaces.
4. back up the system log database: TDengine clusters usually contain a system database named `log`. The data in this database is the data that TDengine runs itself, and the taosdump will not back up the log database by default. If users need to back up the log database, users can use the `-a` or `-allow-sys` command-line parameter.
4. back up the system log database: TDengine clusters usually contain a system database named `log`. The data in this database is the data that TDengine runs itself, and the taosdump will not back up the log database by default. If users need to back up the log database, users can use the `-a` or `-allow-sys` command-line parameter.
5. Loose mode backup: taosdump version 1.4.1 onwards provides `-n` and `-L` parameters for backing up data without using escape characters and "loose" mode, which can reduce the number of backups if table names, column names, tag names do not use This can reduce the backup data time and backup data footprint if table names, column names, and tag names do not use `escape character`. If you are unsure about using `-n` and `-L` conditions, please use the default parameters for "strict" mode backup. See the [official documentation](/taos-sql/escape) for a description of escaped characters.
5. Loose mode backup: taosdump version 1.4.1 onwards provides `-n` and `-L` parameters for backing up data without using escape characters and "loose" mode, which can reduce the number of backups if table names, column names, tag names do not use This can reduce the backup data time and backup data footprint if table names, column names, and tag names do not use `escape character`. If you are unsure about using `-n` and `-L` conditions, please use the default parameters for "strict" mode backup. See the [official documentation](/taos-sql/escape) for a description of escaped characters.
:::tip
:::tip
- taosdump versions after 1.4.1 provide the `-I` argument for parsing Avro file schema and data. If users specify `-s` then only taosdump will parse schema.
- taosdump versions after 1.4.1 provide the `-I` argument for parsing Avro file schema and data. If users specify `-s` then only taosdump will parse schema.
- Backups after taosdump 1.4.2 use the batch count specified by the `-B` parameter. The default value is 16384. If, in some environments, low network speed or disk performance causes "Error actual dump ... batch ..." can be tried by challenging the `-B` parameter to a smaller value.
- Backups after taosdump 1.4.2 use the batch count specified by the `-B` parameter. The default value is 16384. If, in some environments, low network speed or disk performance causes "Error actual dump ... batch ...", then try changing the `-B` parameter to a smaller value.
:::
:::
...
@@ -44,7 +43,7 @@ There are two ways to install taosdump:
...
@@ -44,7 +43,7 @@ There are two ways to install taosdump:
Restore the data file in the specified path: use the `-i` parameter plus the path to the data file. You should not use the same directory to backup different data sets, and you should not backup the same data set multiple times in the same path. Otherwise, the backup data will cause overwriting or multiple backups.
Restore the data file in the specified path: use the `-i` parameter plus the path to the data file. You should not use the same directory to backup different data sets, and you should not backup the same data set multiple times in the same path. Otherwise, the backup data will cause overwriting or multiple backups.
:::tip
:::tip
taosdump internally uses TDengine stmt binding API for writing recovery data and currently uses 16384 as one write batch for better data recovery performance. If there are more columns in the backup data, it may cause a "WAL size exceeds limit" error. You can try to adjust to a smaller value by using the `-B` parameter.
taosdump internally uses TDengine stmt binding API for writing recovery data with a default batch size of 16384 for better data recovery performance. If there are more columns in the backup data, it may cause a "WAL size exceeds limit" error. You can try to adjust the batch size to a smaller value by using the `-B` parameter.
description:Instructions and tips for using the TDengine CLI
description:Instructions and tips for using the TDengine CLI
---
---
The TDengine command-line application (hereafter referred to as `TDengine CLI`) is the most simplest way for users to manipulate and interact with TDengine instances.
The TDengine command-line application (hereafter referred to as `TDengine CLI`) is the simplest way for users to manipulate and interact with TDengine instances.
## Installation
## Installation
If executed on the TDengine server-side, there is no need for additional installation steps to install TDengine CLI as it is already included and installed automatically. To run TDengine CLI on the environment which no TDengine server running, the TDengine client installation package needs to be installed first. For details, please refer to [connector](/reference/connector/).
If executed on the TDengine server-side, there is no need for additional installation steps to install TDengine CLI as it is already included and installed automatically. To run TDengine CLI in an environment where no TDengine server is running, the TDengine client installation package needs to be installed first. For details, please refer to [connector](/reference/connector/).
- The `VERSION` environment variable is used to set the tdengine image tag
- The `VERSION` environment variable is used to set the tdengine image tag
-`TAOS_FIRST_EP` must be set on the newly created instance so that it can join the TDengine cluster; if there is a high availability requirement, `TAOS_SECOND_EP` needs to be used at the same time
-`TAOS_FIRST_EP` must be set on the newly created instance so that it can join the TDengine cluster; if there is a high availability requirement, `TAOS_SECOND_EP` needs to be used at the same time
-`TAOS_REPLICA` is used to set the default number of database replicas. Its value range is [1,3]
-`TAOS_REPLICA` is used to set the default number of database replicas. Its value range is [1,3]
We recommend setting with `TAOS_ARBITRATOR` to use arbitrator in a two-nodes environment.
We recommend setting it with `TAOS_ARBITRATOR` to use arbitrator in a two-nodes environment.
| Meaning | The FQDN of the host where `taosd` will be started. It can be IP address |
| Meaning | The FQDN of the host where `taosd` will be started. It can be IP address |
| Default Value | The first hostname configured for the hos |
| Default Value | The first hostname configured for the host |
| Note | It should be within 96 bytes |
| Note | It should be within 96 bytes |
### serverPort
### serverPort
...
@@ -78,7 +78,7 @@ taos --dump-config
...
@@ -78,7 +78,7 @@ taos --dump-config
| Note | REST service is provided by `taosd` before 2.4.0.0 but by `taosAdapter` after 2.4.0.0, the default port of REST service is 6041 |
| Note | REST service is provided by `taosd` before 2.4.0.0 but by `taosAdapter` after 2.4.0.0, the default port of REST service is 6041 |
:::note
:::note
TDengine uses continuous 13 ports, both TCP and TCP, from the port specified by `serverPort`. These ports need to be kept as open if firewall is enabled. Below table describes the ports used by TDengine in details.
TDengine uses continuous 13 ports, both TCP and UDP, from the port specified by `serverPort`. These ports need to be kept open if firewall is enabled. Below table describes the ports used by TDengine in details.
:::
:::
...
@@ -182,8 +182,8 @@ TDengine uses continuous 13 ports, both TCP and TCP, from the port specified by
...
@@ -182,8 +182,8 @@ TDengine uses continuous 13 ports, both TCP and TCP, from the port specified by
| Meaning | The maximum number of distinct rows returned |
| Meaning | The maximum number of distinct rows returned |
| Value Range | [100,000 - 100, 000, 000] |
| Value Range | [100,000 - 100,000,000] |
| Default Value | 100, 000 |
| Default Value | 100,000 |
| Note | After version 2.3.0.0 |
| Note | After version 2.3.0.0 |
## Locale Parameters
## Locale Parameters
...
@@ -240,7 +240,7 @@ To avoid the problems of using time strings, Unix timestamp can be used directly
...
@@ -240,7 +240,7 @@ To avoid the problems of using time strings, Unix timestamp can be used directly
| Default Value | Locale configured in host |
| Default Value | Locale configured in host |
:::info
:::info
A specific type "nchar" is provided in TDengine to store non-ASCII characters such as Chinese, Japanese, Korean. The characters to be stored in nchar type are firstly encoded in UCS4-LE before sending to server side. To store non-ASCII characters correctly, the encoding format of the client side needs to be set properly.
A specific type "nchar" is provided in TDengine to store non-ASCII characters such as Chinese, Japanese, and Korean. The characters to be stored in nchar type are firstly encoded in UCS4-LE before sending to server side. To store non-ASCII characters correctly, the encoding format of the client side needs to be set properly.
The characters input on the client side are encoded using the default system encoding, which is UTF-8 on Linux, or GB18030 or GBK on some systems in Chinese, POSIX in docker, CP936 on Windows in Chinese. The encoding of the operating system in use must be set correctly so that the characters in nchar type can be converted to UCS4-LE.
The characters input on the client side are encoded using the default system encoding, which is UTF-8 on Linux, or GB18030 or GBK on some systems in Chinese, POSIX in docker, CP936 on Windows in Chinese. The encoding of the operating system in use must be set correctly so that the characters in nchar type can be converted to UCS4-LE.
...
@@ -779,7 +779,7 @@ To prevent system resource from being exhausted by multiple concurrent streams,
...
@@ -779,7 +779,7 @@ To prevent system resource from being exhausted by multiple concurrent streams,
:::note
:::note
HTTP server had been provided by `taosd` prior to version 2.4.0.0, now is provided by `taosAdapter` after version 2.4.0.0.
HTTP server had been provided by `taosd` prior to version 2.4.0.0, now is provided by `taosAdapter` after version 2.4.0.0.
The parameters described in this section are only application in versions prior to 2.4.0.0. If you are using any version from 2.4.0.0, please refer to [taosAdapter]](/reference/taosadapter/).
The parameters described in this section are only application in versions prior to 2.4.0.0. If you are using any version from 2.4.0.0, please refer to [taosAdapter](/reference/taosadapter/).
In IoT applications, many data items are often collected for intelligent control, business analysis, device monitoring, etc. Due to the version upgrade of the application logic, or the hardware adjustment of the device itself, the data collection items may change more frequently. To facilitate the data logging work in such cases, TDengine starting from version 2.2.0.0, it provides a series of interfaces to the schemaless writing method, which eliminates the need to create super tables/sub tables in advance and automatically creates the storage structure corresponding to the data as the data is written to the interface. And when necessary, Schemaless writing will automatically add the required columns to ensure that the data written by the user is stored correctly.
In IoT applications, many data items are often collected for intelligent control, business analysis, device monitoring, etc. Due to the version upgrades of the application logic, or the hardware adjustment of the devices themselves, the data collection items may change frequently. To facilitate the data logging work in such cases, TDengine starting from version 2.2.0.0 provides a series of interfaces to the schemaless writing method, which eliminate the need to create super tables and subtables in advance by automatically creating the storage structure corresponding to the data as the data is written to the interface. And when necessary, schemaless writing will automatically add the required columns to ensure that the data written by the user is stored correctly.
The schemaless writing method creates super tables and their corresponding sub-tables completely indistinguishable from the super tables and sub-tables created directly via SQL. You can write data directly to them via SQL statements. Note that the names of tables created by schemaless writing are based on fixed mapping rules for tag values, so they are not explicitly ideographic and lack readability.
The schemaless writing method creates super tables and their corresponding subtables completely indistinguishable from the super tables and subtables created directly via SQL. You can write data directly to them via SQL statements. Note that the names of tables created by schemaless writing are based on fixed mapping rules for tag values, so they are not explicitly ideographic and lack readability.
## Schemaless Writing Line Protocol
## Schemaless Writing Line Protocol
TDengine's schemaless writing line protocol supports to be compatible with InfluxDB's Line Protocol, OpenTSDB's telnet line protocol, and OpenTSDB's JSON format protocol. However, when using these three protocols, you need to specify in the API the standard of the parsing protocol to be used for the input content.
TDengine's schemaless writing line protocol supports InfluxDB's Line Protocol, OpenTSDB's telnet line protocol, and OpenTSDB's JSON format protocol. However, when using these three protocols, you need to specify in the API the standard of the parsing protocol to be used for the input content.
For the standard writing protocols of InfluxDB and OpenTSDB, please refer to the documentation of each protocol. The following is a description of TDengine's extended protocol, based on InfluxDB's line protocol first. They allow users to control the (super table) schema more granularly.
For the standard writing protocols of InfluxDB and OpenTSDB, please refer to the documentation of each protocol. The following is a description of TDengine's extended protocol, based on InfluxDB's line protocol first. They allow users to control the (super table) schema more granularly.
With the following formatting conventions, Schemaless writing uses a single string to express a data row (multiple rows can be passed into the writing API at once to enable bulk writing).
With the following formatting conventions, schemaless writing uses a single string to express a data row (multiple rows can be passed into the writing API at once to enable bulk writing).
```json
```json
measurement,tag_setfield_settimestamp
measurement,tag_setfield_settimestamp
...
@@ -23,7 +23,7 @@ where :
...
@@ -23,7 +23,7 @@ where :
- measurement will be used as the data table name. It will be separated from tag_set by a comma.
- measurement will be used as the data table name. It will be separated from tag_set by a comma.
- tag_set will be used as tag data in the format `<tag_key>=<tag_value>,<tag_key>=<tag_value>`, i.e. multiple tags' data can be separated by a comma. It is separated from field_set by space.
- tag_set will be used as tag data in the format `<tag_key>=<tag_value>,<tag_key>=<tag_value>`, i.e. multiple tags' data can be separated by a comma. It is separated from field_set by space.
- field_set will be used as normal column data in the format of `<field_key>=<field_value>,<field_key>=<field_value>`, again using a comma to separate multiple normal columns of data. It is separated from the timestamp by space.
- field_set will be used as normal column data in the format of `<field_key>=<field_value>,<field_key>=<field_value>`, again using a comma to separate multiple normal columns of data. It is separated from the timestamp by a space.
- The timestamp is the primary key corresponding to the data in this row.
- The timestamp is the primary key corresponding to the data in this row.
All data in tag_set is automatically converted to the NCHAR data type and does not require double quotes (").
All data in tag_set is automatically converted to the NCHAR data type and does not require double quotes (").
...
@@ -32,7 +32,7 @@ In the schemaless writing data line protocol, each data item in the field_set ne
...
@@ -32,7 +32,7 @@ In the schemaless writing data line protocol, each data item in the field_set ne
- If there are English double quotes on both sides, it indicates the BINARY(32) type. For example, `"abc"`.
- If there are English double quotes on both sides, it indicates the BINARY(32) type. For example, `"abc"`.
- If there are double quotes on both sides and an L prefix, it means NCHAR(32) type. For example, `L"error message"`.
- If there are double quotes on both sides and an L prefix, it means NCHAR(32) type. For example, `L"error message"`.
- Spaces, equal signs (=), commas (,), and double quotes (") need to be escaped with a backslash (\) in front. (All refer to the ASCII character)
- Spaces, equal signs (=), commas (,), and double quotes (") need to be escaped with a backslash (\\) in front. (All refer to the ASCII character)
- Numeric types will be distinguished from data types by the suffix.
- Numeric types will be distinguished from data types by the suffix.
@@ -58,21 +58,21 @@ Note that if the wrong case is used when describing the data type suffix, or if
...
@@ -58,21 +58,21 @@ Note that if the wrong case is used when describing the data type suffix, or if
Schemaless writes process row data according to the following principles.
Schemaless writes process row data according to the following principles.
1. You can use the following rules to generate the sub-table names: first, combine the measurement name and the key and value of the label into the next string:
1. You can use the following rules to generate the subtable names: first, combine the measurement name and the key and value of the label into the next string:
Note that tag_key1, tag_key2 are not the original order of the tags entered by the user but the result of using the tag names in ascending order of the strings. Therefore, tag_key1 is not the first tag entered in the line protocol.
Note that tag_key1, tag_key2 are not the original order of the tags entered by the user but the result of using the tag names in ascending order of the strings. Therefore, tag_key1 is not the first tag entered in the line protocol.
The string's MD5 hash value "md5_val" is calculated after the ranking is completed. The calculation result is then combined with the string to generate the table name: "t_md5_val". "t*" is a fixed prefix that every table generated by this mapping relationship has. 2.
The string's MD5 hash value "md5_val" is calculated after the ranking is completed. The calculation result is then combined with the string to generate the table name: "t_md5_val". "t*" is a fixed prefix that every table generated by this mapping relationship has.
2. If the super table obtained by parsing the line protocol does not exist, this super table is created.
2. If the super table obtained by parsing the line protocol does not exist, this super table is created.
If the sub-table obtained by the parse line protocol does not exist, Schemaless creates the sub-table according to the sub-table name determined in steps 1 or 2. 4.
If the subtable obtained by the parse line protocol does not exist, Schemaless creates the sub-table according to the subtable name determined in steps 1 or 2.
4. If the specified tag or regular column in the data row does not exist, the corresponding tag or regular column is added to the super table (only incremental).
4. If the specified tag or regular column in the data row does not exist, the corresponding tag or regular column is added to the super table (only incremental).
5. If there are some tag columns or regular columns in the super table that are not specified to take values in a data row, then the values of these columns are set to NULL.
5. If there are some tag columns or regular columns in the super table that are not specified to take values in a data row, then the values of these columns are set to NULL.
6. For BINARY or NCHAR columns, if the length of the value provided in a data row exceeds the column type limit, the maximum length of characters allowed to be stored in the column is automatically increased (only incremented and not decremented) to ensure complete preservation of the data.
6. For BINARY or NCHAR columns, if the length of the value provided in a data row exceeds the column type limit, the maximum length of characters allowed to be stored in the column is automatically increased (only incremented and not decremented) to ensure complete preservation of the data.
7. If the specified data sub-table already exists, and the specified tag column takes a value different from the saved value this time, the value in the latest data row overwrites the old tag column take value.
7. If the specified data subtable already exists, and the specified tag column takes a value different from the saved value this time, the value in the latest data row overwrites the old tag column take value.
8. Errors encountered throughout the processing will interrupt the writing process and return an error code.
8. Errors encountered throughout the processing will interrupt the writing process and return an error code.