The data model employed by TDengine is similar to that of a relational database. You have to create databases and tables. You must design the data model based on your own business and application requirements. You should design the [STable](/concept/#super-table-stable) (an abbreviation for super table) schema to fit your data. This chapter will explain the big picture without getting into syntactical details.
The data model employed by TDengine is similar to that of a relational database. You have to create databases and tables. You must design the data model based on your own business and application requirements. You should design the [STable](/concept/#super-table-stable) (an abbreviation for super table) schema to fit your data. This chapter will explain the big picture without getting into syntactical details.
Note: before you read this chapter, please make sure you have already read through [Key Concepts](/concept/), since TDengine introduces new concepts like "one table for one [data collection point](/concept/#data-collection-point)" and "[super table](/concept/#super-table-stable)".
## Create Database
The characteristics of time-series data from different data collection points may be different. Characteristics include collection frequency, retention policy and others which determine how you create and configure the database. For e.g. days to keep, number of replicas, data block size, whether data updates are allowed and other configurable parameters would be determined by the characteristics of your data and your business requirements. For TDengine to operate with the best performance, we strongly recommend that you create and configure different databases for data with different characteristics. This allows you, for example, to set up different storage and retention policies. When creating a database, there are a lot of parameters that can be configured such as, the days to keep data, the number of replicas, the size of the cache, time precision, the minimum and maximum number of rows in each data block, whether compression is enabled, the time range of the data in single data file and so on. An example is shown as follows:
- `measurement` will be used as the name of the STable Enter a comma (,) between `measurement` and `tag_set`.
- `tag_set` will be used as tags, with format like `<tag_key>=<tag_value>,<tag_key>=<tag_value>` Enter a space between `tag_set` and `field_set`.
- `field_set`will be used as data columns, with format like `<field_key>=<field_value>,<field_key>=<field_value>` Enter a space between `field_set` and `timestamp`.
- `timestamp` is the primary key timestamp corresponding to this row of data
- `timestamp` is the primary key timestamp corresponding to this row of data
- All the data in `tag_set` will be converted to nchar type automatically .
- All the data in `tag_set` will be converted to NCHAR type automatically .
- Each data in `field_set` must be self-descriptive for its data type. For example 1.2f32 means a value 1.2 of float type. Without the "f" type suffix, it will be treated as type double.
- Multiple kinds of precision can be used for the `timestamp` field. Time precision can be from nanosecond (ns) to hour (h).
- You can configure smlChildTableName in taos.cfg to specify table names, for example, `smlChildTableName=tname`. You can insert `st,tname=cpul,t1=4 c1=3 1626006833639000000` and the cpu1 table will be automatically created. Note that if multiple rows have the same tname but different tag_set values, the tag_set of the first row is used to create the table and the others are ignored.
- It is assumed that the order of field_set in a supertable is consistent, meaning that the first record contains all fields and subsequent records store fields in the same order. If the order is not consistent, set smlDataFormat in taos.cfg to false. Otherwise, data will be written out of order and a database error will occur.(smlDataFormat in taos.cfg default to false after version of 3.0.1.3)
:::
:::
For more details please refer to [InfluxDB Line Protocol](https://docs.influxdata.com/influxdb/v2.0/reference/syntax/line-protocol/) and [TDengine Schemaless](/reference/schemaless/#Schemaless-Line-Protocol)
...
...
@@ -67,5 +67,9 @@ For more details please refer to [InfluxDB Line Protocol](https://docs.influxdat
</Tabs>
## Query Examples
If you want query the data of `location=California.LosAngeles,groupid=2`,here is the query sql:
select * from `meters.voltage` where location="California.LosAngeles" and groupid=2
If you want query the data of `location=California.LosAngeles,groupid=2`,here is the query SQL:
```sql
SELECT * FROM meters WHERE location = "California.LosAngeles" AND groupid = 2;
@@ -24,15 +24,16 @@ A single line of text is used in OpenTSDB line protocol to represent one row of
- `metric` will be used as the STable name.
- `timestamp` is the timestamp of current row of data. The time precision will be determined automatically based on the length of the timestamp. Second and millisecond time precision are supported.
- `value` is a metric which must be a numeric value, The corresponding column name is "value".
- The last part is the tag set separated by spaces, all tags will be converted to nchar type automatically.
- The last part is the tag set separated by spaces, all tags will be converted to NCHAR type automatically.
- The defult child table name is generated by rules.You can configure smlChildTableName in taos.cfg to specify chile table names, for example, `smlChildTableName=tname`. You can insert `meters.current 1648432611250 11.3 tname=cpu1 location=California.LosAngeles groupid=3` and the cpu1 table will be automatically created. Note that if multiple rows have the same tname but different tag_set values, the tag_set of the first row is used to create the table and the others are ignored.
Please refer to [OpenTSDB Telnet API](http://opentsdb.net/docs/build/html/api_telnet/put.html) for more details.
- The defult child table name is generated by rules.You can configure smlChildTableName in taos.cfg to specify child table names, for example, `smlChildTableName=tname`. You can insert `meters.current 1648432611250 11.3 tname=cpu1 location=California.LosAngeles groupid=3` and the cpu1 table will be automatically created. Note that if multiple rows have the same tname but different tag_set values, the tag_set of the first row is used to create the table and the others are ignored.
Please refer to [OpenTSDB Telnet API](http://opentsdb.net/docs/build/html/api_telnet/put.html) for more details.
## Examples
...
...
@@ -79,6 +80,11 @@ taos> select tbname, * from `meters.current`;
@@ -46,10 +46,10 @@ Please refer to [OpenTSDB HTTP API](http://opentsdb.net/docs/build/html/api_http
:::note
- In JSON protocol, strings will be converted to nchar type and numeric values will be converted to double type.
- In JSON protocol, strings will be converted to NCHAR type and numeric values will be converted to double type.
- Only data in array format is accepted and so an array must be used even if there is only one row.
- The defult child table name is generated by rules.You can configure smlChildTableName in taos.cfg to specify chile table names, for example, `smlChildTableName=tname`. You can insert `"tags": { "host": "web02","dc": "lga","tname":"cpu1"}` and the cpu1 table will be automatically created. Note that if multiple rows have the same tname but different tag_set values, the tag_set of the first row is used to create the table and the others are ignored.
:::
- The defult child table name is generated by rules.You can configure smlChildTableName in taos.cfg to specify child table names, for example, `smlChildTableName=tname`. You can insert `"tags": { "host": "web02","dc": "lga","tname":"cpu1"}` and the cpu1 table will be automatically created. Note that if multiple rows have the same tname but different tag_set values, the tag_set of the first row is used to create the table and the others are ignored.
:::
## Examples
...
...
@@ -94,6 +94,11 @@ taos> select * from `meters.current`;
@@ -16,16 +16,16 @@ To achieve high performance writing, there are a few aspects to consider. In the
From the perspective of application program, you need to consider:
1. The data size of each single write, also known as batch size. Generally speaking, higher batch size generates better writing performance. However, once the batch size is over a specific value, you will not get any additional benefit anymore. When using SQL to write into TDengine, it's better to put as much as possible data in single SQL. The maximum SQL length supported by TDengine is 1,048,576 bytes, i.e. 1 MB.
1. The data size of each single write, also known as batch size. Generally speaking, higher batch size generates better writing performance. However, once the batch size is over a specific value, you will not get any additional benefit anymore. When using SQL to write into TDengine, it's better to put as much as possible data in single SQL. The maximum SQL length supported by TDengine is 1,048,576 bytes, i.e. 1 MB.
2. The number of concurrent connections. Normally more connections can get better result. However, once the number of connections exceeds the processing ability of the server side, the performance may downgrade.
3. The distribution of data to be written across tables or sub-tables. Writing to single table in one batch is more efficient than writing to multiple tables in one batch.
4. Data Writing Protocol.
- Prameter binding mode is more efficient than SQL because it doesn't have the cost of parsing SQL.
- Writing to known existing tables is more efficient than wirting to uncertain tables in automatic creating mode because the later needs to check whether the table exists or not before actually writing data into it
- Writing in SQL is more efficient than writing in schemaless mode because schemaless writing creats table automatically and may alter table schema
- Parameter binding mode is more efficient than SQL because it doesn't have the cost of parsing SQL.
- Writing to known existing tables is more efficient than writing to uncertain tables in automatic creating mode because the later needs to check whether the table exists or not before actually writing data into it.
- Writing in SQL is more efficient than writing in schemaless mode because schemaless writing creates table automatically and may alter table schema.
Application programs need to take care of the above factors and try to take advantage of them. The application progam should write to single table in each write batch. The batch size needs to be tuned to a proper value on a specific system. The number of concurrent connections needs to be tuned to a proper value too to achieve the best writing throughput.
...
...
@@ -37,7 +37,7 @@ Application programs need to read data from data source then write into TDengine
2. The speed of data generation from single data source is much higher than the speed of single writing thread. The purpose of message queue in this case is to provide buffer so that data is not lost and multiple writing threads can get data from the buffer.
3. The data for single table are from multiple data source. In this case the purpose of message queues is to combine the data for single table together to improve the write efficiency.
If the data source is Kafka, then the appication program is a consumer of Kafka, you can benefit from some kafka features to achieve high performance writing:
If the data source is Kafka, then the application program is a consumer of Kafka, you can benefit from some kafka features to achieve high performance writing:
1. Put the data for a table in single partition of single topic so that it's easier to put the data for each table together and write in batch
2. Subscribe multiple topics to accumulate data together.
...
...
@@ -56,7 +56,7 @@ This section will introduce the sample programs to demonstrate how to write into
### Scenario
Below are the scenario for the sample programs of high performance wrting.
Below are the scenario for the sample programs of high performance writing.
- Application program reads data from data source, the sample program simulates a data source by generating data
- The speed of single writing thread is much slower than the speed of generating data, so the program starts multiple writing threads while each thread establish a connection to TDengine and each thread has a message queue of fixed size.
...
...
@@ -80,7 +80,7 @@ The sample programs assume the source data is for all the different sub tables i
@@ -95,16 +95,16 @@ The main Program is responsible for:
1. Create message queues
2. Start writing threads
3. Start reading threads
4. Otuput writing speed every 10 seconds
4. Output writing speed every 10 seconds
The main program provides 4 parameters for tuning:
1. The number of reading threads, default value is 1
2. The number of writing threads, default alue is 2
2. The number of writing threads, default value is 2
3. The total number of tables in the generated data, default value is 1000. These tables are distributed evenly across all writing threads. If the number of tables is very big, it will cost much time to firstly create these tables.
4. The batch size of single write, default value is 3,000
The capacity of message queue also impacts performance and can be tuned by modifying program. Normally it's always better to have a larger message queue. A larger message queue means lower possibility of being blocked when enqueueing and higher throughput. But a larger message queue consumes more memory space. The default value used in the sample programs is already big enoug.
The capacity of message queue also impacts performance and can be tuned by modifying program. Normally it's always better to have a larger message queue. A larger message queue means lower possibility of being blocked when enqueueing and higher throughput. But a larger message queue consumes more memory space. The default value used in the sample programs is already big enough.
@@ -356,7 +356,7 @@ Writing process tries to read as much as possible data from message queue and wr
<details>
SQLWriter class encapsulates the logic of composing SQL and writing data. Please be noted that the tables have not been created before writing, but are created automatically when catching the exception of table doesn't exist. For other exceptions caught, the SQL which caused the exception are logged for you to debug. This class also checks the SQL length, and passes the maximum SQL length by parameter maxSQLLength according to actual TDengine limit.
SQLWriter class encapsulates the logic of composing SQL and writing data. Please be noted that the tables have not been created before writing, but are created automatically when catching the exception of table doesn't exist. For other exceptions caught, the SQL which caused the exception are logged for you to debug. This class also checks the SQL length, and passes the maximum SQL length by parameter maxSQLLength according to actual TDengine limit.
<summary>SQLWriter</summary>
...
...
@@ -372,7 +372,7 @@ SQLWriter class encapsulates the logic of composing SQL and writing data. Please
<summary>Launch Sample Program in Python</summary>
1. Prerequisities
1. Prerequisites
- TDengine client driver has been installed
- Python3 has been installed, the the version >= 3.8
Before creating an application to process time-series data with TDengine, consider the following:
1. Choose the method to connect to TDengine. TDengine offers a REST API that can be used with any programming language. It also has connectors for a variety of languages.
2. Design the data model based on your own use cases. Consider the main [concepts](/concept/) of TDengine, including "one table per data collection point" and the supertable. Learn about static labels, collected metrics, and subtables. Depending on the characteristics of your data and your requirements, you decide to create one or more databases and design a supertable schema that fit your data.
3. Decide how you will insert data. TDengine supports writing using standard SQL, but also supports schemaless writing, so that data can be written directly without creating tables manually.