TDengine supports multiple protocols of inserting data, including SQL, InfluxDB Line protocol, OpenTSDB Telnet protocol, and OpenTSDB JSON protocol. Data can be inserted row by row, or in batches. Data from one or more collection points can be inserted simultaneously. Data can be inserted with multiple threads, and out of order data and historical data can be inserted as well. InfluxDB Line protocol, OpenTSDB Telnet protocol and OpenTSDB JSON protocol are the 3 kinds of schemaless insert protocols supported by TDengine. It's not necessary to create STables and tables in advance if using schemaless protocols, and the schemas can be adjusted automatically based on the data being inserted.
description: "Lightweight service for data subscription and publishing. Time series data inserted into TDengine continuously can be pushed automatically to subscribing clients."
@@ -22,8 +22,8 @@ SELECT COUNT([*|field_name]) FROM tb_name [WHERE clause];
**More explanation**:
- Wildcard (\*) can be used to represent all columns, it's used to get the number of all rows
- The number of non-NULL values will be returned if this function is used on a specific column
- Wildcard (\*) is used to represent all columns. The `COUNT` function is used to get the total number of all rows.
- The number of non-NULL values will be returned if this function is used on a specific column.
**Examples**:
...
...
@@ -87,7 +87,7 @@ SELECT TWA(field_name) FROM tb_name WHERE clause;
**More explanations**:
-From version 2.1.3.0, function TWA can be used on stable with `GROUP BY`, i.e. timelines generated by `GROUP BY tbname` on a STable.
-Since version 2.1.3.0, function TWA can be used on stable with `GROUP BY`, i.e. timelines generated by `GROUP BY tbname` on a STable.
### IRATE
...
...
@@ -105,7 +105,7 @@ SELECT IRATE(field_name) FROM tb_name WHERE clause;
**More explanations**:
-From version 2.1.3.0, function IRATE can be used on stble with `GROUP BY`, i.e. timelines generated by `GROUP BY tbname` on a STable.
-Since version 2.1.3.0, function IRATE can be used on stble with `GROUP BY`, i.e. timelines generated by `GROUP BY tbname` on a STable.
### SUM
...
...
@@ -149,7 +149,7 @@ SELECT STDDEV(field_name) FROM tb_name [WHERE clause];
**Applicable column types**: Data types except for timestamp, binary, nchar and bool
**Applicable table types**: table, STable (starting from version 2.0.15.1)
**Applicable table types**: table, STable (since version 2.0.15.1)
**Examples**:
...
...
@@ -193,13 +193,13 @@ SELECT MODE(field_name) FROM tb_name [WHERE clause];
**Description**:The value which has the highest frequency of occurrence. NULL is returned if there are multiple values which have highest frequency of occurrence. It can't be used on timestamp column or tags.
**Return value type**:Same as the data type of the column being operated
**Return value type**:Same as the data type of the column being operated upon
**Applicable column types**:Data types except for timestamp
**More explanations**:Considering the number of returned result set is unpredictable, it's suggested to limit the number of unique values to 100,000, otherwise error will be returned.
**More explanations**: The benefit of using hyperloglog algorithm is that the memory usage is under control when the data volume is huge. However, when the data volume is very small, the result may be not accurate, it's recommented to use `select count(data) from (select unique(col) as data from table)` in this case.
**Description**: The greatest _k_ values of a specific column in a table or STable. If a value has multiple occurrences in the column but counting all of them in will exceed the upper limit _k_, then a part of them will be returned randomly.
**Return value type**: Same as the column being operated
**Return value type**: Same as the column being operated upon
**Applicable column types**: Data types except for timestamp, binary, nchar and bool
**Description**: The least _k_ values of a specific column in a table or STable. If a value has multiple occurrences in the column but counting all of them in will exceed the upper limit _k_, then a part of them will be returned randomly.
**Return value type**: Same as the column being operated
**Return value type**: Same as the column being operated upon
**Applicable column types**: Data types except for timestamp, binary, nchar and bool
-`INTERP` is used to get the value that matches the specified time slice from a column. If no such value exists an interpolation value will be returned based on `FILL` parameter.
- The input data of `INTERP` is the value of the specified column, `where` can be used to filter the original data. If no `where` condition is specified then all original data is the input.
- The input data of `INTERP` is the value of the specified column and a `where` clause can be used to filter the original data. If no `where` condition is specified then all original data is the input.
- The output time range of `INTERP` is specified by `RANGE(timestamp1,timestamp2)` parameter, with timestamp1<=timestamp2. timestamp1 is the starting point of the output time range and must be specified. timestamp2 is the ending point of the output time range and must be specified. If `RANGE` is not specified, then the timestamp of the first row that matches the filter condition is treated as timestamp1, the timestamp of the last row that matches the filter condition is treated as timestamp2.
- The number of rows in the result set of `INTERP` is determined by the parameter `EVERY`. Starting from timestamp1, one interpolation is performed for every time interval specified `EVERY` parameter. If `EVERY` parameter is not used, the time windows will be considered as no ending timestamp, i.e. there is only one time window from timestamp1.
- Interpolation is performed based on `FILL` parameter. No interpolation is performed if `FILL` is not used, that means either the original data that matches is returned or nothing is returned.
taos> SELECT INTERP(current) FROM t1 where ts >= '2017-07-14 17:00:00' and ts <= '2017-07-14 20:00:00' RANGE('2017-7-14 18:00:00','2017-7-14 19:00:00') EVERY(5s) FILL(LINEAR);
```
### INTERP [Prior to version 2.3.1]
### INTERP [Since version 2.0.15.0]
```
SELECT INTERP(field_name) FROM { tb_name | stb_name } WHERE ts='timestamp' [FILL ({ VALUE | PREV | NULL | LINEAR | NEXT})];
...
...
@@ -640,7 +640,7 @@ SELECT INTERP(field_name) FROM { tb_name | stb_name } WHERE ts='timestamp' [FILL
**Description**: The value of a specific column that matches the specified time slice
**Return value type**: Same as the column being operated
**Return value type**: Same as the column being operated upon
**Applicable column types**: Numeric data type
...
...
@@ -648,7 +648,6 @@ SELECT INTERP(field_name) FROM { tb_name | stb_name } WHERE ts='timestamp' [FILL
**More explanations**:
- It can be used from version 2.0.15.0
- Time slice must be specified. If there is no data matching the specified time slice, interpolation is performed based on `FILL` parameter. Conditions such as tags or `tbname` can be used `Where` clause can be used to filter data.
- The timestamp specified must be within the time range of the data rows of the table or STable. If it is beyond the valid time range, nothing is returned even with `FILL` parameter.
-`INTERP` can be used to query only single time point once. `INTERP` can be used with `EVERY` to get the interpolation value every time interval.
**Description**: The values that occur the first time in the specified column. The effect is similar to `distinct` keyword, but it can also be used to match tags or timestamp.
**Return value type**: Same as the column or tag being operated
**Return value type**: Same as the column or tag being operated upon
**Applicable column types**: Any data types except for timestamp
**Description**: The different of each row with its previous row for a specific column. `ignore_negative` can be specified as 0 or 1, the default value is 1 if it's not specified. `1` means negative values are ignored.
**Return value type**: Same as the column being operated
**Return value type**: Same as the column being operated upon
**Applicable column types**: Data types except for timestamp, binary, nchar and bool
Aggregate by time window is supported in TDengine. For example, each temperature sensor reports the temperature every second, the average temperature every 10 minutes can be retrieved by query with time window.
Window related clauses are used to divide the data set to be queried into subsets and then aggregate. There are three kinds of windows, time window, status window, and session window. There are two kinds of time windows, sliding window and flip time window.
Aggregation by time window is supported in TDengine. For example, in the case where temperature sensors report the temperature every seconds, the average temperature for every 10 minutes can be retrieved by performing a query with a time window.
Window related clauses are used to divide the data set to be queried into subsets and then aggregation is performed across the subsets. There are three kinds of windows: time window, status window, and session window. There are two kinds of time windows: sliding window and flip time/tumbling window.
## Time Window
`INTERVAL` clause is used to generate time windows of the same time interval, `SLIDING` is used to specify the time step for which the time window moves forward. The query is performed on one time window each time, and the time window moves forward with time. When defining continuous query both the size of time window and the step of forward sliding time need to be specified. As shown in the figure blow, [t0s, t0e] ,[t1s , t1e], [t2s, t2e] are respectively the time ranges of three time windows on which continuous queries are executed. The time step for which time window moves forward is marked by `sliding time`. Query, filter and aggregate operations are executed on each time window respectively. When the time step specified by `SLIDING` is same as the time interval specified by `INTERVAL`, the sliding time window is actually a flip time window.
The `INTERVAL` clause is used to generate time windows of the same time interval. The `SLIDING` parameter is used to specify the time step for which the time window moves forward. The query is performed on one time window each time, and the time window moves forward with time. When defining a continuous query, both the size of the time window and the step of forward sliding time need to be specified. As shown in the figure blow, [t0s, t0e] ,[t1s , t1e], [t2s, t2e] are respectively the time ranges of three time windows on which continuous queries are executed. The time step for which time window moves forward is marked by `sliding time`. Query, filter and aggregate operations are executed on each time window respectively. When the time step specified by `SLIDING` is same as the time interval specified by `INTERVAL`, the sliding time window is actually a flip time/tumbling window.

`INTERVAL` and `SLIDING` should be used with aggregate functions and select functions. Below SQL statement is illegal because no aggregate or selection function is used with `INTERVAL`.
`INTERVAL` and `SLIDING` should be used with aggregate functions and select functions. The SQL statement below is illegal because no aggregate or selection function is used with `INTERVAL`.
```
SELECT * FROM temp_tb_1 INTERVAL(1m);
```
The time step specified by `SLIDING` can't exceed the time interval specified by `INTERVAL`. Below SQL statement is illegal because the time length specified by `SLIDING` exceeds that specified by `INTERVAL`.
The time step specified by `SLIDING` cannot exceed the time interval specified by `INTERVAL`. The SQL statement below is illegal because the time length specified by `SLIDING` exceeds that specified by `INTERVAL`.
```
SELECT COUNT(*) FROM temp_tb_1 INTERVAL(1m) SLIDING(2m);
```
When the time length specified by `SLIDING` is the same as that specified by `INTERVAL`, the sliding window is actually a flip window. The minimum time range specified by `INTERVAL` is 10 milliseconds (10a) prior to version 2.1.5.0. From version 2.1.5.0, the minimum time range by `INTERVAL` can be 1 microsecond (1u). However, if the DB precision is millisecond, the minimum time range is 1 millisecond (1a). Please note that the `timezone` parameter should be configured to be the same value in the `taos.cfg` configuration file on client side and server side.
When the time length specified by `SLIDING` is the same as that specified by `INTERVAL`, the sliding window is actually a flip/tumbling window. The minimum time range specified by `INTERVAL` is 10 milliseconds (10a) prior to version 2.1.5.0. Since version 2.1.5.0, the minimum time range by `INTERVAL` can be 1 microsecond (1u). However, if the DB precision is millisecond, the minimum time range is 1 millisecond (1a). Please note that the `timezone` parameter should be configured to be the same value in the `taos.cfg` configuration file on client side and server side.
## Status Window
In case of using integer, bool, or string to represent the device status at a moment, the continuous rows with same status belong to same status window. Once the status changes, the status window closes. As shown in the following figure, there are two status windows according to status, [2019-04-28 14:22:07,2019-04-28 14:22:10] and [2019-04-28 14:22:11,2019-04-28 14:22:12]. Status window is not applicable to STable for now.
In case of using integer, bool, or string to represent the status of a device at any given moment, continuous rows with the same status belong to a status window. Once the status changes, the status window closes. As shown in the following figure, there are two status windows according to status, [2019-04-28 14:22:07,2019-04-28 14:22:10] and [2019-04-28 14:22:11,2019-04-28 14:22:12]. Status window is not applicable to STable for now.

`STATE_WINDOW` is used to specify the column based on which to define status window, for example:
`STATE_WINDOW` is used to specify the column on which the status window will be based. For example:
```
SELECT COUNT(*), FIRST(ts), status FROM temp_tb_1 STATE_WINDOW(status);
...
...
@@ -44,7 +44,7 @@ SELECT COUNT(*), FIRST(ts), status FROM temp_tb_1 STATE_WINDOW(status);
The primary key, i.e. timestamp, is used to determine which session window the row belongs to. If the time interval between two adjacent rows is within the time range specified by `tol_val`, they belong to the same session window; otherwise they belong to two different time windows. As shown in the figure below, if the limit of time interval for the session window is specified as 12 seconds, then the 6 rows in the figure constitutes 2 time windows, [2019-04-28 14:22:10,2019-04-28 14:22:30] and [2019-04-28 14:23:10,2019-04-28 14:23:30], because the time difference between 2019-04-28 14:22:30 and 2019-04-28 14:23:10 is 40 seconds, which exceeds the time interval limit of 12 seconds.
The primary key, i.e. timestamp, is used to determine which session window a row belongs to. If the time interval between two adjacent rows is within the time range specified by `tol_val`, they belong to the same session window; otherwise they belong to two different session windows. As shown in the figure below, if the limit of time interval for the session window is specified as 12 seconds, then the 6 rows in the figure constitutes 2 time windows, [2019-04-28 14:22:10,2019-04-28 14:22:30] and [2019-04-28 14:23:10,2019-04-28 14:23:30], because the time difference between 2019-04-28 14:22:30 and 2019-04-28 14:23:10 is 40 seconds, which exceeds the time interval limit of 12 seconds.
@@ -73,7 +73,7 @@ SELECT function_list FROM stb_name
### Restrictions
- Aggregate functions and select functions can be used in `function_list`, with each function having only one output, for example COUNT, AVG, SUM, STDDEV, LEASTSQUARES, PERCENTILE, MIN, MAX, FIRST, LAST. Functions having multiple output can't be used, for example DIFF or arithmetic operations.
- Aggregate functions and select functions can be used in `function_list`, with each function having only one output. For example COUNT, AVG, SUM, STDDEV, LEASTSQUARES, PERCENTILE, MIN, MAX, FIRST, LAST. Functions having multiple outputs, such as DIFF or arithmetic operations can't be used.
-`LAST_ROW` can't be used together with window aggregate.
- Scalar functions, like CEIL/FLOOR, can't be used with window aggregate.
-`WHERE` clause can be used to specify the starting and ending time and other filter conditions
...
...
@@ -87,8 +87,8 @@ SELECT function_list FROM stb_name
:::info
1.Huge volume of interpolation output may be returned using `FILL`, so it's recommended to specify the time range when using `FILL`. The maximum interpolation values that can be returned in single query is 10,000,000.
2. The result set is in ascending order of timestamp in aggregate by time window aggregate.
1.A huge volume of interpolation output may be returned using `FILL`, so it's recommended to specify the time range when using `FILL`. The maximum number of interpolation values that can be returned in a single query is 10,000,000.
2. The result set is in ascending order of timestamp when you aggregate by time window.
3. If aggregate by window is used on STable, the aggregate function is performed on all the rows matching the filter conditions. If `GROUP BY` is not used in the query, the result set will be returned in ascending order of timestamp; otherwise the result set is not exactly in the order of ascending timestamp in each group.
:::
...
...
@@ -97,13 +97,13 @@ Aggregate by time window is also used in continuous query, please refer to [Cont
## Examples
The table of intelligent meters can be created by the SQL statement below:
A table of intelligent meters can be created by the SQL statement below:
The average current, maximum current and median of current in every 10 minutes for the past 24 hours can be calculated using the below SQL statement, with missing values filled with the previous non-NULL values.
The average current, maximum current and median of current in every 10 minutes for the past 24 hours can be calculated using the SQL statement below, with missing values filled with the previous non-NULL values.
```
SELECT AVG(current), MAX(current), APERCENTILE(current, 50) FROM meters
1. Only English characters, digits and underscore are allowed
2.Can't start with a digit
1. Only characters from the English alphabet, digits and underscore are allowed
2.Names cannot start with a digit
3. Case insensitive without escape character "\`"
4. Identifier with escape character "\`"
To support more flexible table or column names, a new escape character "\`" is introduced. For more details please refer to [escape](/taos-sql/escape).
...
...
@@ -16,38 +16,38 @@ The legal character set is `[a-zA-Z0-9!?$%^&*()_–+={[}]:;@~#|<,>.?/]`.
## General Limits
- Maximum length of database name is 32 bytes
- Maximum length of table name is 192 bytes, excluding the database name prefix and the separator
- Maximum length of each data row is 48K bytes from version 2.1.7.0 , before which the limit is 16K bytes. Please note that the upper limit includes the extra 2 bytes consumed by each column of BINARY/NCHAR type.
- Maximum of column name is 64.
- Maximum length of database name is 32 bytes.
- Maximum length of table name is 192 bytes, excluding the database name prefix and the separator.
- Maximum length of each data row is 48K bytes since version 2.1.7.0 , before which the limit was 16K bytes. Please note that the upper limit includes the extra 2 bytes consumed by each column of BINARY/NCHAR type.
- Maximum length of column name is 64.
- Maximum number of columns is 4096. There must be at least 2 columns, and the first column must be timestamp.
- Maximum length of tag name is 64.
- Maximum number of tags is 128. There must be at least 1 tag. The total length of tag values should not exceed 16K bytes.
- Maximum length of singe SQL statement is 1048576, i.e. 1 MB bytes. It can be configured in the parameter `maxSQLLength` in the client side, the applicable range is [65480, 1048576].
- At most 4096 columns (or 1024 prior to 2.1.7.0) can be returned by `SELECT`, functions in the query statement may constitute columns. Error will be returned if the limit is exceeded.
- Maximum numbers of databases, STables, tables are only depending on the system resources.
- Maximum length of singe SQL statement is 1048576, i.e. 1 MB. It can be configured in the parameter `maxSQLLength` in the client side, the applicable range is [65480, 1048576].
- At most 4096 columns (or 1024 prior to 2.1.7.0) can be returned by `SELECT`. Functions in the query statement constitute columns. An error is returned if the limit is exceeded.
- Maximum numbers of databases, STables, tables are dependent only on the system resources.
- Maximum of database name is 32 bytes, and it can't include "." or special characters.
- Maximum replica number of database is 3
- Maximum length of user name is 23 bytes
- Maximum length of password is 15 bytes
- Maximum number of rows depends on the storage space only.
- Maximum number of tables depends on the number of nodes only.
- Maximum number of databases depends on the number of nodes only.
- Maximum number of vnodes for single database is 64.
- Maximum number of replicas for a database is 3.
- Maximum length of user name is 23 bytes.
- Maximum length of password is 15 bytes.
- Maximum number of rows depends only on the storage space.
- Maximum number of tables depends only on the number of nodes.
- Maximum number of databases depends only on the number of nodes.
- Maximum number of vnodes for a single database is 64.
## Restrictions of `GROUP BY`
`GROUP BY` can be performed on tags and `TBNAME`. It can be performed on data columns too, with one restriction that only one column and the number of unique values on that column is lower than 100,000. Please note that `GROUP BY` can't be performed on float or double types.
`GROUP BY` can be performed on tags and `TBNAME`. It can be performed on data columns too, with the only restriction being it can only be performed on one data column and the number of unique values in that column is lower than 100,000. Please note that `GROUP BY` cannot be performed on float or double types.
## Restrictions of `IS NOT NULL`
`IS NOT NULL` can be used on any data type of columns. The non-empty string evaluation expression, i.e. `<\>""` can only be used on non-numeric data types.
`IS NOT NULL` can be used on any data type of columns. The non-empty string evaluation expression, i.e. `< > ""` can only be used on non-numeric data types.
## Restrictions of `ORDER BY`
- Only one `order by` is allowed for normal table and subtable.
- At most two `order by` are allowed for STable, and the second one must be `ts`.
-`order by tag` must be used with `group by tag` on same tag, this rule is also applicable to `tbname`.
-`order by tag` must be used with `group by tag` on same tag. This rule is also applicable to `tbname`.
-`order by column` must be used with `group by column` or `top/bottom` on same column. This rule is applicable to table and STable.
-`order by ts` is applicable to table and STable.
- If `order by ts` is used with `group by`, the result set is sorted using `ts` in each group.
...
...
@@ -56,7 +56,7 @@ The legal character set is `[a-zA-Z0-9!?$%^&*()_–+={[}]:;@~#|<,>.?/]`.
### Name Restrictions of Table/Column
The name of a table or column can only be composed of ASCII characters, digits and underscore, while it can't start with a digit. The maximum length is 192 bytes. Names are case insensitive. The name mentioned in this rule doesn't include the database name prefix and the separator.
The name of a table or column can only be composed of ASCII characters, digits and underscore and it cannot start with a digit. The maximum length is 192 bytes. Names are case insensitive. The name mentioned in this rule doesn't include the database name prefix and the separator.
TDengine community version provides dev and rpm packages for users to choose based on the system environment. deb supports Debian, Ubuntu and systems derived from them. rpm supports CentOS, RHEL, SUSE and systems derived from them. Furthermore, tar.gz package is provided for enterprise customers.
TDengine community version provides deb and rpm packages for users to choose from, based on their system environment. The deb package supports Debian, Ubuntu and derivative systems. The rpm package supports CentOS, RHEL, SUSE and derivative systems. Furthermore, a tar.gz package is provided for TDengine Enterprise customers.
## Install
...
...
@@ -124,7 +124,7 @@ taoskeeper is installed, enable it by `systemctl enable taoskeeper`
```
:::info
Some configuration will be prompted for users to provide when install.sh is executing, the interactive mode can be disabled by executing `./install.sh -e no`. `./install -h` can show all parameters and detailed explanation.
Users will be prompted to enter some configuration information when install.sh is executing. The interactive mode can be disabled by executing `./install.sh -e no`. `./install.sh -h` can show all parameters with detailed explanation.
:::
...
...
@@ -132,7 +132,7 @@ Some configuration will be prompted for users to provide when install.sh is exec
</Tabs>
:::note
When installing on the first node in the cluster, when "Enter FQDN:" is prompted, nothing needs to be provided. When installing on following nodes, when "Enter FQDN:" is prompted, the end point of the first dnode in the cluster can be input if it is already up; or just ignore it and configure later after installation is done.
When installing on the first node in the cluster, at the "Enter FQDN:" prompt, nothing needs to be provided. When installing on subsequent nodes, at the "Enter FQDN:" prompt, you must enter the end point of the first dnode in the cluster if it is already up. You can also just ignore it and configure it later after installation is finished.
:::
...
...
@@ -181,14 +181,14 @@ taosKeeper is removed successfully!
:::note
-It's strongly suggested not to use multiple kinds of installation packages on a single host TDengine
- After deb package is installed, if the installation directory is removed manually so that uninstall or reinstall can't succeed, it can be resolved by cleaning up TDengine package information as in the command below and then reinstalling.
-We strongly recommend not to use multiple kinds of installation packages on a single host TDengine.
- After deb package is installed, if the installation directory is removed manually, uninstall or reinstall will not work. This issue can be resolved by using the command below which cleans up TDengine package information. You can then reinstall if needed.
```bash
$ sudo rm-f /var/lib/dpkg/info/tdengine*
```
- After rpm package is installed, if the installation directory is removed manually so that uninstall or reinstall can't succeed, it can be resolved by cleaning up TDengine package information as in the command below and then reinstalling.
- After rpm package is installed, if the installation directory is removed manually, uninstall or reinstall will not work. This issue can be resolved by using the command below which cleans up TDengine package information. You can then reinstall if needed.
- Configuration directory, data directory, and log directory are created automatically if they don't exist
- The default configuration file is located at /etc/taos/taos.cfg, which is a copy of /usr/local/taos/cfg/taos.cfg if not existing
- The default configuration file is located at /etc/taos/taos.cfg, which is a copy of /usr/local/taos/cfg/taos.cfg
- The default data directory is /var/lib/taos, which is a soft link to /usr/local/taos/data
- The default log directory is /var/log/taos, which is a soft link to /usr/local/taos/log
- The executables at /usr/local/taos/bin are linked to /usr/bin
...
...
@@ -228,7 +228,7 @@ During the installation process:
:::note
- When TDengine is uninstalled, the configuration /etc/taos/taos.cfg, data directory /var/lib/taos, log directory /var/log/taos are kept. They can be deleted manually with caution because data can't be recovered
- When TDengine is uninstalled, the configuration /etc/taos/taos.cfg, data directory /var/lib/taos, log directory /var/log/taos are kept. They can be deleted manually with caution, because data can't be recovered. Please follow data integrity, security, backup or relevant SOPs before deleting any data.
- When reinstalling TDengine, if the default configuration file /etc/taos/taos.cfg exists, it will be kept and the configuration file in the installation package will be renamed to taos.cfg.orig and stored at /usr/local/taos/cfg to be used as configuration sample. Otherwise the configuration file in the installation package will be installed to /etc/taos/taos.cfg and used.
## Start and Stop
...
...
@@ -263,18 +263,19 @@ Active: inactive (dead)
There are two aspects in upgrade operation: upgrade installation package and upgrade a running server.
Upgrading package should follow the steps mentioned previously to first uninstall the old version then install the new version.
To upgrade a package, follow the steps mentioned previously to first uninstall the old version then install the new version.
Upgrading a running server is much more complex. First please check the version number of the old version and the new version. The version number of TDengine consists of 4 sections, only if the first 3 section match can the old version be upgraded to the new version. The steps of upgrading a running server are as below:
Upgrading a running server is much more complex. First please check the version number of the old version and the new version. The version number of TDengine consists of 4 sections, only if the first 3 sections match can the old version be upgraded to the new version. The steps of upgrading a running server are as below:
- Stop inserting data
- Make sure all data are persisted into disk
- Make sure all data is persisted to disk
- Make some simple queries (Such as total rows in stables, tables and so on. Note down the values. Follow best practices and relevant SOPs.)
- Stop the cluster of TDengine
- Uninstall old version and install new version
- Start the cluster of TDengine
-Make some simple queries to make sure no data loss
-Make some simple data insertion to make sure the cluster works well
- Restore business data
-Execute simple queries, such as the ones executed prior to installing the new package, to make sure there is no data loss
-Run some simple data insertion statements to make sure the cluster works well
The computing and storage resources need to be planned if using TDengine to build an IoT platform. How to plan the CPU, memory and disk required will be described in this chapter.
It is important to plan computing and storage resources if using TDengine to build an IoT, time-series or Big Data platform. How to plan the CPU, memory and disk resources required, will be described in this chapter.
## Memory Requirement of Server Side
The number of vgroups created for each database is the same as the number of CPU cores by default and can be configured by parameter `maxVgroupsPerDb`, each vnode in a vgroup stores one replica. Each vnode consumes a fixed size of memory, i.e. `blocks` \* `cache`. Besides, some memory is required for tag values associated with each table. A fixed amount of memory is required for each cluster. So, the memory required for each DB can be calculated using the formula below:
By default, the number of vgroups created for each database is the same as the number of CPU cores. This can be configured by the parameter `maxVgroupsPerDb`. Each vnode in a vgroup stores one replica. Each vnode consumes a fixed amount of memory, i.e. `blocks` \* `cache`. In addition, some memory is required for tag values associated with each table. A fixed amount of memory is required for each cluster. So, the memory required for each DB can be calculated using the formula below:
For example, assuming the default value of `maxVgroupPerDB` is 64, the default value of `cache` 16M, the default value of `blocks` is 6, there are 100,000 tables in a DB, the replica number is 1, total length of tag values is 256 bytes, the total memory required for this DB is: 64 \* 1 \* (16 \* 6 + 10) + 100000 \* (0.25 + 0.5) / 1000 = 6792M.
For example, assuming the default value of `maxVgroupPerDB` is 64, the default value of `cache` is 16M, the default value of `blocks` is 6, there are 100,000 tables in a DB, the replica number is 1, total length of tag values is 256 bytes, the total memory required for this DB is: 64 \* 1 \* (16 \* 6 + 10) + 100000 \* (0.25 + 0.5) / 1000 = 6792M.
In the real operation of TDengine, we are more concerned about the memory used by each TDengine server process `taosd`.
...
...
@@ -22,10 +22,10 @@ In the real operation of TDengine, we are more concerned about the memory used b
In the above formula:
1. "vnode_memory" of a `taosd` process is the memory used by all vnodes hosted by this `taosd` process. It can be roughly calculated by firstly adding up the total memory of all DBs whose memory usage can be derived according to the formula mentioned previously then dividing by number of dnodes and multiplying the number of replicas.
1. "vnode_memory" of a `taosd` process is the memory used by all vnodes hosted by this `taosd` process. It can be roughly calculated by firstly adding up the total memory of all DBs whose memory usage can be derived according to the formula for Database Memory Size, mentioned above, then dividing by number of dnodes and multiplying the number of replicas.
2. "mnode_memory" of a `taosd` process is the memory consumed by a mnode. If there is one (and only one) mnode hosted in a `taosd` process, the memory consumed by "mnode" is "0.2KB \* the total number of tables in the cluster".
...
...
@@ -56,8 +56,8 @@ So, at least 3GB needs to be reserved for such a client.
The CPU resources required depend on two aspects:
- **Data Insertion** Each dnode of TDengine can process at least 10,000 insertion requests in one second, while each insertion request can have multiple rows. The computing resource consumed between inserting 1 row one time and inserting 10 rows one time is very small. So, the more the rows to insert one time, the higher the efficiency. Inserting in bach also exposes requirements for the client side which needs to cache rows and insert in batch once the cached rows reaches a threshold.
- **Data Query** High efficiency query is provided in TDengine, but it's hard to estimate the CPU resource required because the queries used in different use cases and the frequency of queries vary significantly. It can only be verified with the query statements, query frequency, data size to be queried, etc provided by user.
- **Data Insertion** Each dnode of TDengine can process at least 10,000 insertion requests in one second, while each insertion request can have multiple rows. The difference in computing resource consumed, between inserting 1 row at a time, and inserting 10 rows at a time is very small. So, the more the number of rows that can be inserted one time, the higher the efficiency. Inserting in batch also imposes requirements on the client side which needs to cache rows to insert in batch once the number of cached rows reaches a threshold.
- **Data Query** High efficiency query is provided in TDengine, but it's hard to estimate the CPU resource required because the queries used in different use cases and the frequency of queries vary significantly. It can only be verified with the query statements, query frequency, data size to be queried, and other requirements provided by users.
In short, the CPU resource required for data insertion can be estimated but it's hard to do so for query use cases. In real operation, it's suggested to control CPU usage below 50%. If this threshold is exceeded, it's a reminder for system operator to add more nodes in the cluster to expand resources.
For example, there are 10,000,000 meters, while each meter collects data every 15 minutes and the data size of each collection is 128 bytes, so the raw data size of one year is: 10000000 \* 128 \* 24 \* 60 / 15 \* 365 = 44.8512(TB). Assuming compression ratio is 5, the actual disk size is: 44.851 / 5 = 8.97024(TB).
Parameter `keep` can be used to set how long the data will be kept on disk. To further reduce storage cost, multiple storage levels can be enabled in TDengine, with the coldest data stored on the cheapest storage device, and this is transparent to application programs.
Parameter `keep` can be used to set how long the data will be kept on disk. To further reduce storage cost, multiple storage levels can be enabled in TDengine, with the coldest data stored on the cheapest storage device. This is completely transparent to application programs.
To increase the performance, multiple disks can be setup for parallel data reading or data inserting. Please note that an expensive disk array is not necessary because replications are used in TDengine to provide high availability.
To increase performance, multiple disks can be setup for parallel data reading or data inserting. Please note that an expensive disk array is not necessary because replications are used in TDengine to provide high availability.
## Number of Hosts
A host can be either physical or virtual. The total memory, total CPU, total disk required can be estimated according to the formulas mentioned previously. Then, according to the system resources that a single host can provide, assuming all hosts have the same resources, the number of hosts can be derived easily.
A host can be either physical or virtual. The total memory, total CPU, total disk required can be estimated according to the formulae mentioned previously. Then, according to the system resources that a single host can provide, assuming all hosts have the same resources, the number of hosts can be derived easily.
**Quick Estimation for CPU, Memory and Disk** Please refer to [Resource Estimate](https://www.taosdata.com/config/config.html).
TDengine uses **WAL**, i.e. Write Ahead Log, to achieve fault tolerance and high reliability.
When a data block is received by TDengine, the original data block is first written into WAL. The log in WAL will be deleted only after the data has been written into data files in the database. Data can be recovered from WAL in case the server is stopped abnormally due to any reason and then restarted.
When a data block is received by TDengine, the original data block is first written into WAL. The log in WAL will be deleted only after the data has been written into data files in the database. Data can be recovered from WAL in case the server is stopped abnormally for any reason and then restarted.
There are 2 configuration parameters related to WAL:
- walLevel:
- 0:wal is disabled;
- 1:wal is enabled without fsync;
- 2:wal is enabled with fsync.
- fsync:only valid when walLevel is set to 2, it specifies the interval of invoking fsync. If set to 0, it means fsync is invoked immediately once WAL is written.
- 0:wal is disabled
- 1:wal is enabled without fsync
- 2:wal is enabled with fsync
- fsync:This parameter is only valid when walLevel is set to 2. It specifies the interval, in milliseconds, of invoking fsync. If set to 0, it means fsync is invoked immediately once WAL is written.
To achieve absolutely no data loss, walLevel needs to be set to 2 and fsync needs to be set to 1. The penalty is the performance of data ingestion downgrades. However, if the concurrent threads of data insertion on the client side can reach a big enough number, for example 50, the data ingestion performance would be still good enough, our verification shows that the drop is only 30% compared to fsync is set to 3,000 milliseconds.
To achieve absolutely no data loss, walLevel should be set to 2 and fsync should be set to 1. There is a performance penalty to the data ingestion rate. However, if the concurrent data insertion threads on the client side can reach a big enough number, for example 50, the data ingestion performance will be still good enough. Our verification shows that the drop is only 30% when fsync is set to 3,000 milliseconds.
## Disaster Recovery
TDengine uses replications to provide high availability and disaster recovery capability.
TDengine uses replication to provide high availability and disaster recovery capability.
TDengine cluster is managed by mnode. To make sure the high availability of mnode, multiple replicas can be configured by the system parameter `numOfMnodes`. The data replication between mnode replicas is performed in a synchronous way to guarantee the metadata consistency.
A TDengine cluster is managed by mnode. To ensure the high availability of mnode, multiple replicas can be configured by the system parameter `numOfMnodes`. The data replication between mnode replicas is performed in a synchronous way to guarantee metadata consistency.
The number of replicas for the time series data in TDengine is associated with each database, there can be a lot of databases in a cluster while each database can be configured with a different number of replicas. When creating a database, parameter `replica` is used to configure the number of replications. To achieve high availability, `replica` needs to be higher than 1.
The number of replicas for time series data in TDengine is associated with each database. There can be many databases in a cluster and each database can be configured with a different number of replicas. When creating a database, parameter `replica` is used to configure the number of replications. To achieve high availability, `replica` needs to be higher than 1.
The number of dnodes in a TDengine cluster must NOT be lower than the number of replicas for any database, otherwise it would fail when trying to create a table.
As long as the dnodes of a TDengine cluster are deployed on different physical machines and the replica number is set to bigger than 1, high availability can be achieved without any other assistance. If dnodes of TDengine cluster are deployed in geographically different data centers, disaster recovery can be achieved too.
As long as the dnodes of a TDengine cluster are deployed on different physical machines and the replica number is higher than 1, high availability can be achieved without any other assistance. For disaster recovery, dnodes of a TDengine cluster should be deployed in geographically different data centers.
There are two ways of exporting data from a TDengine cluster, one is SQL statement in TDengine CLI, the other one is `taosdump`.
There are two ways of exporting data from a TDengine cluster:
- Using a SQL statement in TDengine CLI
- Using the `taosdump` tool
## Export Using SQL
If you want to export the data of a table or a STable, please execute below SQL statement in TDengine CLI.
If you want to export the data of a table or a STable, please execute the SQL statement below, in the TDengine CLI.
```sql
select*from<tb_name>>>data.csv;
...
...
@@ -16,4 +18,4 @@ The data of table or STable specified by `tb_name` will be exported into a file
## Export Using taosdump
With `taosdump`, you can choose to export the data of all databases, a database, a table or a STable, you can also choose export the data within a time range, or even only export the schema definition of a table. For the details of using `taosdump` please refer to [Tool for exporting and importing data: taosdump](/reference/taosdump).
With `taosdump`, you can choose to export the data of all databases, a database, a table or a STable, you can also choose to export the data within a time range, or even only export the schema definition of a table. For the details of using `taosdump` please refer to [Tool for exporting and importing data: taosdump](/reference/taosdump).
A system operator can use TDengine CLI to show the connections, ongoing queries, stream computing, and can close connection or stop ongoing query task or stream computing.
A system operator can use the TDengine CLI to show connections, ongoing queries, stream computing, and can close connections or stop ongoing query tasks or stream computing.
## Show Connections
...
...
@@ -13,7 +13,7 @@ SHOW CONNECTIONS;
One column of the output of the above SQL command is "ip:port", which is the end point of the client.
## Close Connections Forcedly
## Force Close Connections
```sql
KILLCONNECTION<connection-id>;
...
...
@@ -27,9 +27,9 @@ In the above SQL command, `connection-id` is from the first column of the output
SHOWQUERIES;
```
The first column of the output is query ID, which is composed of the corresponding connection ID and the sequence number of the current query task started on this connection, in format of "connection-id:query-no".
The first column of the output is query ID, which is composed of the corresponding connection ID and the sequence number of the current query task started on this connection. The format is "connection-id:query-no".
## Close Queries Forcedly
## Force Close Queries
```sql
KILLQUERY<query-id>;
...
...
@@ -43,9 +43,9 @@ In the above SQL command, `query-id` is from the first column of the output of `
SHOWSTREAMS;
```
The first column of the output is stream ID, which is composed of the connection ID and the sequence number of the current stream started on this connection, in the format of "connection-id:stream-no".
The first column of the output is stream ID, which is composed of the connection ID and the sequence number of the current stream started on this connection. The format is "connection-id:stream-no".
After TDengine is started, a database named `log`for monitoring is created automatically. The information about CPU, memory, disk, bandwidth, number of requests, disk I/O speed, slow query is written into `log` database on the basis of a predefined interval. Additionally, some important system operations, like logon, create user, drop database, and alerts and warnings generated in TDengine are written into the `log` database too. A system operator can view the data in `log` database from TDengine CLI or from a web console.
After TDengine is started, a database named `log`is created automatically to help with monitoring. Information that includes CPU, memory and disk usage, bandwidth, number of requests, disk I/O speed, slow queries, is written into the `log` database at a predefined interval. Additionally, some important system operations, like logon, create user, drop database, and alerts and warnings generated in TDengine are written into the `log` database too. A system operator can view the data in `log` database from TDengine CLI or from a web console.
The collection of the monitoring information is enabled by default, but can be disabled by parameter `monitor` in the configuration file.
## TDinsight
TDinsight is a complete solution which uses the monitor database `log` mentioned previously and Grafana to monitor a TDengine cluster.
TDinsight is a complete solution which uses the monitoring database `log` mentioned previously, and Grafana, to monitor a TDengine cluster.
From version 2.3.3.0, more monitoring data has been added in the `log` database. Please refer to [TDinsight Grafana Dashboard](https://grafana.com/grafana/dashboards/15167) to learn more details about using TDinsight to monitor TDengine.
This chapter is mainly written for system administrators, covering download, install/uninstall, data import/export, system monitoring, user management, connection management, etc. Capacity planning and system optimization are also covered.
This chapter is mainly written for system administrators. It covers download, install/uninstall, data import/export, system monitoring, user management, connection management, capacity planning and system optimization.