TSZ 压缩算法压缩率比原来的要高,但压缩时间会更长,即开启 TSZ 压缩算法写入速度会有一些下降,通常情况下会有 20% 左右的下降。影响写入速度是因为需要更多的 CPU 计算,所以从原始数据到压缩好数据的交付时间变长,导致写入速度变慢。如果您的服务器 CPU 配置很高的话,这个影响会变小甚至没有。
@@ -19,25 +19,24 @@ import InstallOnLinux from "../../14-reference/03-connector/\_windows_install.md
import VerifyLinux from "../../14-reference/03-connector/\_verify_linux.mdx";
import VerifyWindows from "../../14-reference/03-connector/\_verify_windows.mdx";
Any application programs running on any kind of platforms can access TDengine through the REST API provided by TDengine. For the details, please refer to [REST API](/reference/rest-api/). Besides, application programs can use the connectors of multiple programming languages to access TDengine, including C/C++, Java, Python, Go, Node.js, C#, and Rust. This chapter describes how to establish connection to TDengine and briefly introduce how to install and use connectors. For details about the connectors, please refer to [Connectors](/reference/connector/)
Any application programs running on any kind of platforms can access TDengine through the REST API provided by TDengine. For the details, please refer to [REST API](/reference/rest-api/). Besides, application programs can use the connectors of multiple programming languages to access TDengine, including C/C++, Java, Python, Go, Node.js, C#, and Rust. This chapter describes how to establish connection to TDengine and briefly introduces how to install and use connectors. For details about the connectors, please refer to [Connectors](/reference/connector/)
## Establish Connection
There are two ways for a connector to establish connections to TDengine:
1. Connection through the REST API provided by taosAdapter component, this way is called "REST connection" hereinafter.
1. Connection through the REST API provided by the taosAdapter component, this way is called "REST connection" hereinafter.
2. Connection through the TDengine client driver (taosc), this way is called "Native connection" hereinafter.
Either way, same or similar APIs are provided by connectors to access database or execute SQL statements, no obvious difference can be observed.
Key differences:
1. With REST connection, it's not necessary to install TDengine client driver (taosc), it's more friendly for cross-platform with the cost of 30% performance downgrade. When taosc has an upgrade, application does not need to make changes.
2. With native connection, full compatibility of TDengine can be utilized, like [Parameter Binding](/reference/connector/cpp#parameter-binding-api), [Subscription](/reference/connector/cpp#subscription-and-consumption-api), etc. But taosc has to be installed, some platforms may not be supported.
1. The TDengine client driver (taosc) has the highest performance with all the features of TDengine like [Parameter Binding](/reference/connector/cpp#parameter-binding-api), [Subscription](/reference/connector/cpp#subscription-and-consumption-api), etc.
2. The TDengine client driver (taosc) is not supported across all platforms, and applications built on taosc may need to be modified when updating taosc to newere versions.
3. The REST connection is more accessible with cross-platform support, however it results in a 30% performance downgrade.
## Install Client Driver taosc
If choosing to use native connection and the application is not on the same host as TDengine server, TDengine client driver taosc needs to be installed on the host where the application is. If choosing to use REST connection or the application is on the same host as server side, this step can be skipped. It's better to use same version of taosc as the server.
If you are choosing to use native connection and the application is not on the same host as TDengine server, the TDengine client driver taosc needs to be installed on the application host. If choosing to use the REST connection or the application is on the same host as TDengine server, this step can be skipped. It's better to use same version of taosc as the server.
The data model employed by TDengine is similar to relational database, you need to create databases and tables. For a specific application, the design of databases, STables (abbreviated for super table), and tables need to be considered. This chapter will explain the big picture without syntax details.
The data model employed by TDengine is similar to a relational database, you need to create databases and tables. Design the data model based on your own application scenarios and you should design the STable (abbreviation for super table) schema to fit your data. This chapter will explain the big picture without getting into syntax details.
## Create Database
The characteristics of data from different data collection points may be different, such as collection frequency, days to keep, number of replicas, data block size, whether it's allowed to update data, etc. For TDengine to operate with the best performance, it's strongly suggested to put the data with different characteristics into different databases because different storage policy can be set for each database. When creating a database, there are a lot of parameters that can be configured, such as the days to keep data, the number of replicas, the number of memory blocks, time precision, the minimum and maximum number of rows in each data block, compress or not, the time range of the data in single data file, etc. Below is an example of the SQL statement for creating a database.
The characteristics of data from different data collection points may be different, such as collection frequency, days to keep, number of replicas, data block size, whether it's allowed to update data, etc. For TDengine to operate with the best performance, it's strongly suggested to put the data with different characteristics into different databases because different storage policies can be set for each database. When creating a database, there are a lot of parameters that can be configured, such as the days to keep data, the number of replicas, the number of memory blocks, time precision, the minimum and maximum number of rows in each data block, compress or not, the time range of the data in single data file, etc. Below is an example of the SQL statement for creating a database.
```sql
CREATE DATABASE power KEEP 365 DAYS 10 BLOCKS 6 UPDATE 1;
...
...
@@ -14,7 +14,7 @@ CREATE DATABASE power KEEP 365 DAYS 10 BLOCKS 6 UPDATE 1;
In the above SQL statement, a database named "power" will be created, the data in it will be kept for 365 days, which means the data older than 365 days will be deleted automatically, a new data file will be created every 10 days, the number of memory blocks is 6, data is allowed to be updated. For more details please refer to [Database](/taos-sql/database).
After creating a database, the current database in use can be switched using SQL command `USE`, for example below SQL statement switches the current database to `power`. Without current database specified, table name must be preceded with the corresponding database name.
After creating a database, the current database in use can be switched using SQL command `USE`, for example below SQL statement switches the current database to `power`. Without the current database specified, table name must be preceded with the corresponding database name.
```sql
USE power;
...
...
@@ -23,14 +23,14 @@ USE power;
:::note
- Any table or STable must belong to a database. To create a table or STable, the database it belongs to must be ready.
- JOIN operation can't be performed tables from two different databases.
- JOIN operations can't be performed on tables from two different databases.
- Timestamp needs to be specified when inserting rows or querying historical rows.
:::
## Create STable
In a time-series application, there may be multiple kinds of data collection points. For example, in the electrical power system there are meters, transformers, bus bars, switches, etc. For easy and efficient aggregation of multiple tables, one STable needs to be created for each kind of data collection point. For example, for the meters in [table 1](/tdinternal/arch#model_table1), below SQL statement can be used to create the super table.
In a time-series application, there may be multiple kinds of data collection points. For example, in the electrical power system there are meters, transformers, bus bars, switches, etc. For easy and efficient aggregation of multiple tables, one STable needs to be created for each kind of data collection point. For example, for the meters in [table 1](/tdinternal/arch#model_table1), the below SQL statement can be used to create the super table.
```sql
CREATE STable meters (ts timestamp, current float, voltage int, phase float) TAGS (location binary(64), groupId int);
...
...
@@ -41,11 +41,11 @@ If you are using versions prior to 2.0.15, the `STable` keyword needs to be repl
:::
Similar to creating a regular table, when creating a STable, name and schema need to be provided too. In the STable schema, the first column must be timestamp (like ts in the example), and other columns (like current, voltage and phase in the example) are the data collected. The type of a column can be integer, float, double, string ,etc. Besides, the schema for tags need to be provided, like location and groupId in the example. The type of a tag can be integer, float, string, etc. The static properties of a data collection point can be defined as tags, like the location, device type, device group ID, manager ID, etc. Tags in the schema can be added, removed or updated. Please refer to [STable](/taos-sql/stable) for more details.
Similar to creating a regular table, when creating a STable, the name and schema need to be provided. In the STable schema, the first column must be timestamp (like ts in the example), and the other columns (like current, voltage and phase in the example) are the data collected. The column type can be integer, float, double, string ,etc. Besides, the schema for tags need to be provided, like location and groupId in the example. The tag type can be integer, float, string, etc. The static properties of a data collection point can be defined as tags, like the location, device type, device group ID, manager ID, etc. Tags in the schema can be added, removed or updated. Please refer to [STable](/taos-sql/stable) for more details.
For each kind of data collection points, a corresponding STable must be created. There may be many STables in an application. For electrical power system, we need to create a STable respectively for meters, transformers, busbars, switches. There may be multiple kinds of data collection points on a single device, for example there may be one data collection point for electrical data like current and voltage and another point for environmental data like temperature, humidity and wind direction, multiple STables are required for such kind of device.
For each kind of data collection point, a corresponding STable must be created. There may be many STables in an application. For electrical power system, we need to create a STable respectively for meters, transformers, busbars, switches. There may be multiple kinds of data collection points on a single device, for example there may be one data collection point for electrical data like current and voltage and another point for environmental data like temperature, humidity and wind direction, multiple STables are required for such kind of device.
At most 4096 (or 1024 prior to version 2.1.7.0) columns are allowed in a STable. If there are more than 4096 of metrics to bo collected for a data collection point, multiple STables are required for such kind of data collection point. There can be multiple databases in system, while one or more STables can exist in a database.
At most 4096 (or 1024 prior to version 2.1.7.0) columns are allowed in a STable. If there are more than 4096 of metrics to be collected for a data collection point, multiple STables are required. There can be multiple databases in a system, while one or more STables can exist in a database.
In the above SQL statement, "d1001" is the table name, "meters" is the STable name, followed by the value of tag "Location" and the value of tag "groupId", which are "Beijing.Chaoyang" and "2" respectively in the example. The tag values can be updated after the table is created. Please refer to [Tables](/taos-sql/table) for details.
In TDengine system, it's recommended to create a table for a data collection point via STable. Table created via STable is called subtable in some parts of TDengine document. All SQL commands applied on regular table can be applied on subtable.
In TDengine system, it's recommended to create a table for a data collection point via STable. A table created via STable is called subtable in some parts of the TDengine documentation. All SQL commands applied on regular tables can be applied on subtables.
:::warning
It's not recommended to create a table in a database while using a STable from another database as template.
...
...
@@ -67,7 +67,7 @@ It's suggested to use the global unique ID of a data collection point as the tab
## Create Table Automatically
In some circumstances, it's not sure whether the table already exists when inserting rows. The table can be created automatically using the SQL statement below, and nothing will happen if the table already exist.
In some circumstances, it's unknown whether the table already exists when inserting rows. The table can be created automatically using the SQL statement below, and nothing will happen if the table already exist.
```sql
INSERT INTO d1001 USING meters TAGS ("Beijng.Chaoyang", 2) VALUES (now, 10.2, 219, 0.32);
...
...
@@ -79,6 +79,6 @@ For more details please refer to [Create Table Automatically](/taos-sql/insert#a
## Single Column vs Multiple Column
Multiple columns data model is supported in TDengine. As long as multiple metrics are collected by same data collection point at same time, i.e. the timestamp are identical, these metrics can be put in single stable as columns. However, there is another kind of design, i.e. single column data model, a table is created for each metric, which means a STable is required for each kind of metric. For example, 3 STables are required for current, voltage and phase.
A multiple columns data model is supported in TDengine. As long as multiple metrics are collected by the same data collection point at the same time, i.e. the timestamp are identical, these metrics can be put in a single STable as columns. However, there is another kind of design, i.e. single column data model, a table is created for each metric, which means a STable is required for each kind of metric. For example, 3 STables are required for current, voltage and phase.
It's recommended to use multiple column data model as much as possible because it's better in the performance of inserting or querying rows. In some cases, however, the metrics to be collected vary frequently and correspondingly the STable schema needs to be changed frequently too. In such case, it's more convenient to use single column data model.
It's recommended to use a multiple column data model as much as possible because it's better in the performance of inserting or querying rows. In some cases, however, the metrics to be collected vary frequently and correspondingly the STable schema needs to be changed frequently too. In such case, it's more convenient to use single column data model.
After a TDengine cluster has been running for long enough time, because of updating data, deleting tables and deleting expired data, there may be fragments in data files and query performance may be impacted. To resolve the problem of fragments, from version 2.1.3.0 a new SQL command `COMPACT` can be used to defragment the data files.
```sql
COMPACTVNODESIN(vg_id1,vg_id2,...)
```
`COMPACT` can be used to defragment one or more vgroups. The defragmentation work will be put in task queue for scheduling execution by TDengine. `SHOW VGROUPS` command can be used to get the vgroup ids to be used in `COMPACT` command. There is a column `compacting` in the output of `SHOW GROUPS` to indicate the compacting status of the vgroup: 2 means the vgroup is waiting in task queue for compacting, 1 means compacting is in progress, and 0 means the vgroup has nothing to do with compacting.
Please be noted that a lot of disk I/O is required for defragementation operation, during which the performance may be impacted significantly for data insertion and query, data insertion may be blocked shortly in extreme cases.
## Optimize Storage Parameters
The data in different use cases may have different characteristics, such as the days to keep, number of replicas, collection interval, record size, number of collection points, compression or not, etc. To achieve best efficiency in storage, the parameters in below table can be used, all of them can be either configured in `taos.cfg` as default configuration or in the command `create database`. For detailed definition of these parameters please refer to [Configuration Parameters](/reference/config/).
| 1 | days | Day | The time range of the data stored in a single data file | 1-3650 | 10 |
| 2 | keep | Day | The number of days the data is kept in the database | 1-36500 | 3650 |
| 3 | cache | MB | The size of each memory block | 1-128 | 16 |
| 4 | blocks | None | The number of memory blocks used by each vnode | 3-10000 | 6 |
| 5 | quorum | None | The number of required confirmation in case of multiple replicas | 1-2 | 1 |
| 6 | minRows | None | The minimum number of rows in a data file | 10-1000 | 100 |
| 7 | maxRows | None | The maximum number of rows in a daa file | 200-10000 | 4096 |
| 8 | comp | None | Whether to compress the data | 0:uncompressed; 1: One Phase compression; 2: Two Phase compression | 2 |
| 9 | walLevel | None | wal sync level (named as "wal" in create database ) | 1:wal enabled without fsync; 2:wal enabled with fsync | 1 |
| 10 | fsync | ms | The time to wait for invoking fsync when walLevel is set to 2; 0 means no wait | 3000 |
| 11 | replica | none | The number of replications | 1-3 | 1 |
| 12 | precision | none | Time precision | ms: millisecond; us: microsecond;ns: nanosecond | ms |
| 13 | update | none | Whether to allow updating data | 0: not allowed; 1: a row must be updated as whole; 2: a part of columns in a row can be updated | 0 |
| 14 | cacheLast | none | Whether the latest data of a table is cached in memory | 0: not cached; 1: the last row is cached; 2: the latest non-NULL value of each column is cached | 0 |
For a specific use case, there may be multiple kinds of data with different characteristics, it's best to put data with same characteristics in same database. So there may be multiple databases in a system while each database can be configured with different storage parameters to achieve best performance. The above parameters can be used when creating a database to override the default setting in configuration file.
The above SQL statement creates a database named as `demo`, in which each data file stores data across 10 days, the size of each memory block is 32 MB and each vnode is allocated with 8 blocks, the replica is set to 3, update operation is allowed, and all other parameters not specified in the command follow the default configuration in `taos.cfg`.
Once a database is created, only some parameters can be changed and be effective immediately while others are can't.
**Explanation:** Prior to version 2.1.3.0, `taosd` server process needs to be restarted for these parameters to take in effect if they are changed using `ALTER DATABASE`.
When trying to join a new dnode into a running TDengine cluster, all the parameters related to cluster in the new dnode configuration must be consistent with the cluster, otherwise it can't join the cluster. The parameters that are checked when joining a dnode are as below. For detailed definition of these parameters please refer to [Configuration Parameters](/reference/config/).
- numOfMnodes
- mnodeEqualVnodeNum
- offlineThreshold
- statusInterval
- maxTablesPerVnode
- maxVgroupsPerDb
- arbitrator
- timezone
- balance
- flowctrl
- slaveQuery
- adjustMaster
For the convenience of debugging, the log setting of a dnode can be changed temporarily. The temporary change will be lost once the server is restarted.
```sql
ALTERDNODE<dnode_id><config>
```
- dnode_id: from output of "SHOW DNODES"
- config: the parameter to be changed, as below
- resetlog: close the old log file and create the new on