index.md 11.0 KB
Newer Older
D
dingbo 已提交
1
---
D
dingbo 已提交
2
title: Concepts
D
dingbo 已提交
3 4
---

S
sean-tdengine 已提交
5
In order to explain the basic concepts and provide some sample code, the TDengine documentation takes smart meters as a typical time series data scenario. Assuming that each smart meter collects three metrics of current, voltage, and phase, there are multiple smart meters, and each meter has static attributes like location and group ID, the collected data will be similar to the following table:
D
dingbo 已提交
6

7 8
<div className="center-table">
<table>
D
dingbo 已提交
9
<thead><tr>
10 11 12 13
    <th>Device ID</th>
    <th>Time Stamp</th>
    <th colSpan="3">Collected Metrics</th>
    <th colSpan="2">Tags</th>
D
dingbo 已提交
14 15
    </tr>
<tr>
16 17 18 19 20 21 22
<th>Device ID</th>
<th>Time Stamp</th>
<th>current</th>
<th>voltage</th>
<th>phase</th>
<th>location</th>
<th>groupId</th>
D
dingbo 已提交
23 24 25 26
</tr>
</thead>
<tbody>
<tr>
27 28 29 30 31 32 33
<td>d1001</td>
<td>1538548685000</td>
<td>10.3</td>
<td>219</td>
<td>0.31</td>
<td>Beijing.Chaoyang</td>
<td>2</td>
D
dingbo 已提交
34 35
</tr>
<tr>
36 37 38 39 40 41 42
<td>d1002</td>
<td>1538548684000</td>
<td>10.2</td>
<td>220</td>
<td>0.23</td>
<td>Beijing.Chaoyang</td>
<td>3</td>
D
dingbo 已提交
43 44
</tr>
<tr>
45 46 47 48 49 50 51
<td>d1003</td>
<td>1538548686500</td>
<td>11.5</td>
<td>221</td>
<td>0.35</td>
<td>Beijing.Haidian</td>
<td>3</td>
D
dingbo 已提交
52 53
</tr>
<tr>
54 55 56 57 58 59 60
<td>d1004</td>
<td>1538548685500</td>
<td>13.4</td>
<td>223</td>
<td>0.29</td>
<td>Beijing.Haidian</td>
<td>2</td>
D
dingbo 已提交
61 62
</tr>
<tr>
63 64 65 66 67 68 69
<td>d1001</td>
<td>1538548695000</td>
<td>12.6</td>
<td>218</td>
<td>0.33</td>
<td>Beijing.Chaoyang</td>
<td>2</td>
D
dingbo 已提交
70 71
</tr>
<tr>
72 73 74 75 76 77 78
<td>d1004</td>
<td>1538548696600</td>
<td>11.8</td>
<td>221</td>
<td>0.28</td>
<td>Beijing.Haidian</td>
<td>2</td>
D
dingbo 已提交
79 80
</tr>
<tr>
81 82 83 84 85 86 87
<td>d1002</td>
<td>1538548696650</td>
<td>10.3</td>
<td>218</td>
<td>0.25</td>
<td>Beijing.Chaoyang</td>
<td>3</td>
D
dingbo 已提交
88 89
</tr>
<tr>
90 91 92 93 94 95 96
<td>d1001</td>
<td>1538548696800</td>
<td>12.3</td>
<td>221</td>
<td>0.31</td>
<td>Beijing.Chaoyang</td>
<td>2</td>
D
dingbo 已提交
97 98
</tr>
</tbody>
99 100 101
</table>
<a href="#model_table1">Table 1: Smart meter example data</a>
</div>
D
dingbo 已提交
102

S
sean-tdengine 已提交
103
Each row contains the device ID, time stamp, collected metrics (current, voltage, phase as above), and static tags (location and groupId in Table 1) associated with the devices. Each smart meter generates a row (measurement) in a pre-defined time interval or triggered by an external event. The device produces a sequence of measurements with associated time stamps.
D
dingbo 已提交
104

陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
105
## Metric
D
dingbo 已提交
106

S
sean-tdengine 已提交
107
Metric refers to the physical quantity collected by sensors, equipment or other types of data collection devices, such as current, voltage, temperature, pressure, GPS position, etc., which change with time, and the data type can be integer, float, Boolean, or strings. As time goes by, the amount of collected metric data stored increases.
D
dingbo 已提交
108

陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
109
## Label/Tag
D
dingbo 已提交
110

S
sean-tdengine 已提交
111
Label/Tag refers to the static properties of sensors, equipment or other types of data collection devices, which do not change with time, such as device model, color, fixed location of the device, etc. The data type can be any type. Although static, TDengine allows users to add, delete or update tag values at any time. Unlike the collected metric data, the amount of tag data stored does not change over time.
D
dingbo 已提交
112

113
## Data Collection Point
D
dingbo 已提交
114

S
sean-tdengine 已提交
115
Data Collection Point (DCP) refers to hardware or software that collects metrics based on preset time periods or triggered by events. A data collection point can collect one or multiple metrics, but these metrics are collected at the same time and have the same time stamp. For some complex equipments, there are often multiple data collection points, and the sampling rate of each collection point may be different, and fully independent. For example, for a car, there could be a data collection point to collect GPS position metrics, a data collection point to collect engine status metrics, and a data collection point to collect the environment metrics inside the car, so in this example the car would have three data collection points.
D
dingbo 已提交
116

陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
117
## Table
D
dingbo 已提交
118

陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
119
Since time-series data is most likely to be structured data, TDengine adopts the traditional relational database model to process them with a short learning curve. You need to create a database, create tables, then insert data points and execute queries to explore the data. 
D
dingbo 已提交
120

S
sean-tdengine 已提交
121
To make full use of time-series data characteristics, TDengine adopts a strategy of "**One Table for One Data Collection Point**". TDengine requires the user to create a table for each data collection point (DCP) to store collected time-series data. For example, if there are over 10 million smart meters, it means 10 million tables should be created. For the table above, 4 tables should be created for devices D1001, D1002, D1003, and D1004 to store the data collected. This design has several benefits:
D
dingbo 已提交
122

S
sean-tdengine 已提交
123
1. Since the metric data from different DCP are fully independent, the data source of each DCP is unique, and a table has only one writer. In this way, data points can be written in a lock-free manner, and the writing speed can be greatly improved.
陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
124 125
2. For a DCP, the metric data generated by DCP is ordered by timestamp, so the write operation can be implemented by simple appending, which further greatly improves the data writing speed.
3. The metric data from a DCP is continuously stored in block by block. If you read data for a period of time, it can greatly reduce random read operations and improve read and query performance by orders of magnitude.
S
sean-tdengine 已提交
126
4. Inside a data block for a DCP, columnar storage is used, and different compression algorithms are used for different data types. Metrics generally don't vary as significantly between themselves over a time range as compared to other metrics, this allows for a higher compression rate.
D
dingbo 已提交
127

S
sean-tdengine 已提交
128
If the metric data of multiple DCPs are traditionally written into a single table, due to the uncontrollable network delay, the timing of the data from different DCPs arriving at the server cannot be guaranteed, the writing operation must be protected by locks, and the metric data from one DCP cannot be guaranteed to be continuously stored together. **One table for one data collection point can ensure the best performance of insert and query of a single data collection point to the greatest extent.**
D
dingbo 已提交
129

130
TDengine suggests using DCP ID as the table name (like D1001 in the above table). Each DCP may collect one or multiple metrics (like the current, voltage, phase as above). Each metric has a corresponding column in the table. The data type for a column can be int, float, string and others. In addition, the first column in the table must be a timestamp. TDengine uses the time stamp as the index, and won’t build the index on any metrics stored. Column wise storage is used.
D
dingbo 已提交
131

陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
132
## Super Table (STable)
D
dingbo 已提交
133

陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
134
The design of one table for one data collection point will require a huge number of tables, which is difficult to manage. Furthermore, applications often need to take aggregation operations among DCPs, thus aggregation operations will become complicated. To support aggregation over multiple tables efficiently, the STable(Super Table) concept is introduced by TDengine.
D
dingbo 已提交
135

S
sean-tdengine 已提交
136
STable is a template for a type of data collection point. A STable contains a set of data collection points (tables) that have the same schema or data structure, but with different static attributes (tags). To describe a STable, in addition to defining the table structure of the metrics, it is also necessary to define the schema of its tags. The data type of tags can be int, float, string, and there can be multiple tags, which can be added, deleted, or modified afterward. If the whole system has N different types of data collection points, N STables need to be established.
D
dingbo 已提交
137

陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
138
In the design of TDengine, **a table is used to represent a specific data collection point, and STable is used to represent a set of data collection points of the same type**. 
D
dingbo 已提交
139

陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
140
## Subtable
D
dingbo 已提交
141

S
sean-tdengine 已提交
142
When creating a table for a specific data collection point, the user can use a STable as a template and specifies the tag values of this specific DCP to create it. **The table created by using a STable as the template is called subtable** in TDengine. The difference between regular table and subtable is: 
陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
143
1. Subtable is a table, all SQL commands applied on a regular table can be applied on subtable.
S
sean-tdengine 已提交
144 145 146
2. Subtable is a table with extensions, it has static tags (labels), and these tags can be added, deleted, and updated after it is created. But a regular table does not have tags.
3. A subtable belongs to only one STable, but a STable may have many subtables. Regular tables do not belong to a STable.
4. A regular table can not be converted into a subtable, and vice versa. 
陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
147 148 149 150

The relationship between a STable and the subtables created based on this STable is as follows:

1. A STable contains multiple subtables with the same metric schema but with different tag values.
S
sean-tdengine 已提交
151
2. The schema of metrics or labels cannot be adjusted through subtables, and it can only be changed via STable. Changes to the schema of a STable takes effect immediately for all associated subtables.
陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
152 153
3. STable defines only one template and does not store any data or label information by itself. Therefore, data cannot be written to a STable, only to subtables.

S
sean-tdengine 已提交
154
Queries can be executed on both a table (subtable) and a STable. For a query on a STable, TDengine will treat the data in all its subtables as a whole data set for processing. TDengine will first find the subtables that meet the tag filter conditions, then scan the time-series data of these subtables to perform aggregation operation, which can greatly reduce the data sets to be scanned, thus greatly improving the performance of data aggregation across multiple DCPs.
陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
155

D
dingbo 已提交
156
In TDengine, it is recommended to use a subtable instead of a regular table for a DCP. 
D
dingbo 已提交
157

陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
158 159 160 161 162 163
## Database

A database is a collection of tables. TDengine allows a running instance to have multiple databases, and each database can be configured with different storage policies. Different types of DCPs often have different data characteristics, including the frequency of data collection, data retention time, the number of replications, the size of data blocks, whether data is allowed to be updated, and so on. In order for TDengine to work with maximum efficiency in various scenarios, TDengine recommends that STables with different data characteristics be created in different databases.

In a database, there can be one or more STables, but a STable belongs to only one database. All tables owned by a STable are stored in only one database.

D
dingbo 已提交
164 165
## FQDN & End Point

陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
166
FQDN (Fully Qualified Domain Name) is the full domain name of a specific computer or host on the Internet. FQDN consists of two parts: hostname and domain name. For example, the FQDN of a mail server might be mail.tdengine.com. The hostname is mail, and the host is located in the domain name tdengine.com. DNS (Domain Name System) is responsible for translating FQDN into IP. For systems without DNS, it can be solved by configuring the hosts file.
D
dingbo 已提交
167

168
Each node of a TDengine cluster is uniquely identified by an End Point, which consists of an FQDN and a Port, such as h1.tdengine.com:6030. In this way, when the IP changes, we can still use the FQDN to dynamically find the node without changing any configuration of the cluster. In addition, FQDN is used to facilitate unified access to the same cluster from the Intranet and the Internet.
D
dingbo 已提交
169

S
sean-tdengine 已提交
170
TDengine does not recommend using an IP address to access the cluster, FQDN is recommended for cluster management.