@@ -6,101 +6,100 @@ In order to explain the basic concepts and provide some sample code, the TDengin
<divclassName="center-table">
<table>
<thead><tr>
<th>Device ID</th>
<th>Time Stamp</th>
<thcolSpan="3">Collected Metrics</th>
<thcolSpan="2">Tags</th>
<thead>
<tr>
<throwSpan="2">Device ID</th>
<throwSpan="2">Timestamp</th>
<thcolSpan="3">Collected Metrics</th>
<thcolSpan="2">Tags</th>
</tr>
<tr>
<th>Device ID</th>
<th>Time Stamp</th>
<th>current</th>
<th>voltage</th>
<th>phase</th>
<th>location</th>
<th>groupId</th>
</tr>
</thead>
<tbody>
<tr>
<td>d1001</td>
<td>1538548685000</td>
<td>10.3</td>
<td>219</td>
<td>0.31</td>
<td>California.SanFrancisco</td>
<td>2</td>
</tr>
<tr>
<td>d1002</td>
<td>1538548684000</td>
<td>10.2</td>
<td>220</td>
<td>0.23</td>
<td>California.SanFrancisco</td>
<td>3</td>
</tr>
<tr>
<td>d1003</td>
<td>1538548686500</td>
<td>11.5</td>
<td>221</td>
<td>0.35</td>
<td>California.LosAngeles</td>
<td>3</td>
</tr>
<tr>
<td>d1004</td>
<td>1538548685500</td>
<td>13.4</td>
<td>223</td>
<td>0.29</td>
<td>California.LosAngeles</td>
<td>2</td>
</tr>
<tr>
<td>d1001</td>
<td>1538548695000</td>
<td>12.6</td>
<td>218</td>
<td>0.33</td>
<td>California.SanFrancisco</td>
<td>2</td>
</tr>
<tr>
<td>d1004</td>
<td>1538548696600</td>
<td>11.8</td>
<td>221</td>
<td>0.28</td>
<td>California.LosAngeles</td>
<td>2</td>
</tr>
<tr>
<td>d1002</td>
<td>1538548696650</td>
<td>10.3</td>
<td>218</td>
<td>0.25</td>
<td>California.SanFrancisco</td>
<td>3</td>
</tr>
<tr>
<td>d1001</td>
<td>1538548696800</td>
<td>12.3</td>
<td>221</td>
<td>0.31</td>
<td>California.SanFrancisco</td>
<td>2</td>
</tr>
</tbody>
<tr>
<th>Current</th>
<th>Voltage</th>
<th>Phase</th>
<th>Location</th>
<th>Group ID</th>
</tr>
</thead>
<tbody>
<tr>
<td>d1001</td>
<td>1538548685000</td>
<td>10.3</td>
<td>219</td>
<td>0.31</td>
<td>California.SanFrancisco</td>
<td>2</td>
</tr>
<tr>
<td>d1002</td>
<td>1538548684000</td>
<td>10.2</td>
<td>220</td>
<td>0.23</td>
<td>California.SanFrancisco</td>
<td>3</td>
</tr>
<tr>
<td>d1003</td>
<td>1538548686500</td>
<td>11.5</td>
<td>221</td>
<td>0.35</td>
<td>California.LosAngeles</td>
<td>3</td>
</tr>
<tr>
<td>d1004</td>
<td>1538548685500</td>
<td>13.4</td>
<td>223</td>
<td>0.29</td>
<td>California.LosAngeles</td>
<td>2</td>
</tr>
<tr>
<td>d1001</td>
<td>1538548695000</td>
<td>12.6</td>
<td>218</td>
<td>0.33</td>
<td>California.SanFrancisco</td>
<td>2</td>
</tr>
<tr>
<td>d1004</td>
<td>1538548696600</td>
<td>11.8</td>
<td>221</td>
<td>0.28</td>
<td>California.LosAngeles</td>
<td>2</td>
</tr>
<tr>
<td>d1002</td>
<td>1538548696650</td>
<td>10.3</td>
<td>218</td>
<td>0.25</td>
<td>California.SanFrancisco</td>
<td>3</td>
</tr>
<tr>
<td>d1001</td>
<td>1538548696800</td>
<td>12.3</td>
<td>221</td>
<td>0.31</td>
<td>California.SanFrancisco</td>
<td>2</td>
</tr>
</tbody>
</table>
<ahref="#model_table1">Table 1: Smart meter example data</a>
</div>
Each row contains the device ID, time stamp, collected metrics (current, voltage, phase as above), and static tags (location and groupId in Table 1) associated with the devices. Each smart meter generates a row (measurement) in a pre-defined time interval or triggered by an external event. The device produces a sequence of measurements with associated time stamps.
Each row contains the device ID, timestamp, collected metrics (current, voltage, phase as above), and static tags (location and groupId in Table 1) associated with the devices. Each smart meter generates a row (measurement) in a pre-defined time interval or triggered by an external event. The device produces a sequence of measurements with associated timestamps.
## Metric
...
...
@@ -108,26 +107,26 @@ Metric refers to the physical quantity collected by sensors, equipment or other
## Label/Tag
Label/Tag refers to the static properties of sensors, equipment or other types of data collection devices, which do not change with time, such as device model, color, fixed location of the device, etc. The data type can be any type. Although static, TDengine allows users to add, delete or update tag values at any time. Unlike the collected metric data, the amount of tag data stored does not change over time. In the meters example, `location` and `groupid` are the tags.
Label/Tag refers to the static properties of sensors, equipment or other types of data collection devices, which do not change with time, such as device model, color, fixed location of the device, etc. The data type can be any type. Although static, TDengine allows users to add, delete or update tag values at any time. Unlike the collected metric data, the amount of tag data stored does not change over time. In the meters example, `Location` and `Group ID` are the tags.
## Data Collection Point
Data Collection Point (DCP) refers to hardware or software that collects metrics based on preset time periods or triggered by events. A data collection point can collect one or multiple metrics, but these metrics are collected at the same time and have the same timestamp. For some complex equipment, there are often multiple data collection points, and the sampling rate of each collection point may be different, and fully independent. For example, for a car, there could be a data collection point to collect GPS position metrics, a data collection point to collect engine status metrics, and a data collection point to collect the environment metrics inside the car. So in this example the car would have three data collection points. In the smart meters example, d1001, d1002, d1003, and d1004 are the data collection points.
Data Collection Point (DCP) refers to hardware or software that collects metrics based on preset time periods or triggered by events. A data collection point can collect one or multiple metrics, but these metrics are collected at the same time and have the same timestamp. For some complex equipment, there are often multiple data collection points, and the sampling rate of each collection point may be different, and fully independent. For example, for a car, there could be a data collection point to collect GPS position metrics, a data collection point to collect engine status metrics, and a data collection point to collect the environment metrics inside the car. So in this example the car would have three data collection points. In the smart meters example, d1001, d1002, d1003, and d1004 are the data collection points.
## Table
Since time-series data is most likely to be structured data, TDengine adopts the traditional relational database model to process them with a short learning curve. You need to create a database, create tables, then insert data points and execute queries to explore the data.
To make full use of time-series data characteristics, TDengine adopts a strategy of "**One Table for One Data Collection Point**". TDengine requires the user to create a table for each data collection point (DCP) to store collected time-series data. For example, if there are over 10 million smart meters, it means 10 million tables should be created. For the table above, 4 tables should be created for devices D1001, D1002, D1003, and D1004 to store the data collected. This design has several benefits:
To make full use of time-series data characteristics, TDengine adopts a strategy of "**One Table for One Data Collection Point**". TDengine requires the user to create a table for each data collection point (DCP) to store collected time-series data. For example, if there are over 10 million smart meters, it means 10 million tables should be created. For the table above, 4 tables should be created for devices d1001, d1002, d1003, and d1004 to store the data collected. This design has several benefits:
1. Since the metric data from different DCP are fully independent, the data source of each DCP is unique, and a table has only one writer. In this way, data points can be written in a lock-free manner, and the writing speed can be greatly improved.
2. For a DCP, the metric data generated by DCP is ordered by timestamp, so the write operation can be implemented by simple appending, which further greatly improves the data writing speed.
3. The metric data from a DCP is continuously stored, block by block. If you read data for a period of time, it can greatly reduce random read operations and improve read and query performance by orders of magnitude.
4. Inside a data block for a DCP, columnar storage is used, and different compression algorithms are used for different data types. Metrics generally don't vary as significantly between themselves over a time range as compared to other metrics, which allows for a higher compression rate.
If the metric data of multiple DCPs are traditionally written into a single table, due to uncontrollable network delays, the timing of the data from different DCPs arriving at the server cannot be guaranteed, write operations must be protected by locks, and metric data from one DCP cannot be guaranteed to be continuously stored together. **One table for one data collection point can ensure the best performance of insert and query of a single data collection point to the greatest possible extent.**
If the metric data of multiple DCPs are traditionally written into a single table, due to uncontrollable network delays, the timing of the data from different DCPs arriving at the server cannot be guaranteed, write operations must be protected by locks, and metric data from one DCP cannot be guaranteed to be continuously stored together. **One table for one data collection point can ensure the best performance of insert and query of a single data collection point to the greatest possible extent.**
TDengine suggests using DCP ID as the table name (like D1001 in the above table). Each DCP may collect one or multiple metrics (like the current, voltage, phase as above). Each metric has a corresponding column in the table. The data type for a column can be int, float, string and others. In addition, the first column in the table must be a timestamp. TDengine uses the timestamp as the index, and won’t build the index on any metrics stored. Column wise storage is used.
TDengine suggests using DCP ID as the table name (like D1001 in the above table). Each DCP may collect one or multiple metrics (like the current, voltage, phase as above). Each metric has a corresponding column in the table. The data type for a column can be int, float, string and others. In addition, the first column in the table must be a timestamp. TDengine uses the timestamp as the index, and won’t build the index on any metrics stored. Column wise storage is used.
Complex devices, such as connected cars, may have multiple DCPs. In this case, multiple tables are created for a single device, one table per DCP.
...
...
@@ -158,7 +157,14 @@ Queries can be executed on both a table (subtable) and a STable. For a query on
In TDengine, it is recommended to use a subtable instead of a regular table for a DCP. In the smart meters example, we can create subtables like d1001, d1002, d1003, and d1004 under super table meters.
To better understand the data model using metri, tags, super table and subtable, please refer to the diagram below which demonstrates the data model of the smart meters example. ![Meters Data Model Diagram](./supertable.webp)
To better understand the data model using metrics, tags, super table and subtable, please refer to the diagram below which demonstrates the data model of the smart meters example.
<figure>
![Meters Data Model Diagram](./supertable.webp)
<center><figcaption>Figure 1. Meters Data Model Diagram</figcaption></center>
</figure>
## Database
...
...
@@ -172,4 +178,4 @@ FQDN (Fully Qualified Domain Name) is the full domain name of a specific compute
Each node of a TDengine cluster is uniquely identified by an End Point, which consists of an FQDN and a Port, such as h1.tdengine.com:6030. In this way, when the IP changes, we can still use the FQDN to dynamically find the node without changing any configuration of the cluster. In addition, FQDN is used to facilitate unified access to the same cluster from the Intranet and the Internet.
TDengine does not recommend using an IP address to access the cluster. FQDN is recommended for cluster management.
TDengine does not recommend using an IP address to access the cluster. FQDN is recommended for cluster management.
超级表是指某一特定类型的数据采集点的集合。同一类型的数据采集点,其表的结构是完全一样的,但每个表(数据采集点)的静态属性(标签)是不一样的。描述一个超级表(某一特定类型的数据采集点的集合),除需要定义采集量的表结构之外,还需要定义其标签的 schema,标签的数据类型可以是整数、浮点数、字符串,标签可以有多个,可以事后增加、删除或修改。如果整个系统有 N 个不同类型的数据采集点,就需要建立 N 个超级表。
超级表是指某一特定类型的数据采集点的集合。同一类型的数据采集点,其表的结构是完全一样的,但每个表(数据采集点)的静态属性(标签)是不一样的。描述一个超级表(某一特定类型的数据采集点的集合),除需要定义采集量的表结构之外,还需要定义其标签的 Schema,标签的数据类型可以是整数、浮点数、字符串、JSON,标签可以有多个,可以事后增加、删除或修改。如果整个系统有 N 个不同类型的数据采集点,就需要建立 N 个超级表。
FQDN (fully qualified domain name, 完全限定域名)是 Internet 上特定计算机或主机的完整域名。FQDN 由两部分组成:主机名和域名。例如,假设邮件服务器的 FQDN 可能是 mail.tdengine.com。主机名是 mail,主机位于域名 tdengine.com 中。DNS(Domain Name System),负责将 FQDN 翻译成 IP,是互联网应用的寻址方式。对于没有 DNS 的系统,可以通过配置 hosts 文件来解决。
FQDN(Fully Qualified Domain Name,完全限定域名)是 Internet 上特定计算机或主机的完整域名。FQDN 由两部分组成:主机名和域名。例如,假设邮件服务器的 FQDN 可能是 mail.tdengine.com。主机名是 mail,主机位于域名 tdengine.com 中。DNS(Domain Name System),负责将 FQDN 翻译成 IP,是互联网应用的寻址方式。对于没有 DNS 的系统,可以通过配置 hosts 文件来解决。
TDengine 集群的每个节点是由 End Point 来唯一标识的,End Point 是由 FQDN 外加 Port 组成,比如 h1.tdengine.com:6030。这样当 IP 发生变化的时候,我们依然可以使用 FQDN 来动态找到节点,不需要更改集群的任何配置。而且采用 FQDN,便于内网和外网对同一个集群的统一访问。
TDengine 集群的每个节点是由 Endpoint 来唯一标识的,Endpoint 是由 FQDN 外加 Port 组成,比如 h1.tdengine.com:6030。这样当 IP 发生变化的时候,我们依然可以使用 FQDN 来动态找到节点,不需要更改集群的任何配置。而且采用 FQDN,便于内网和外网对同一个集群的统一访问。
TDengine 不建议采用直接的 IP 地址访问集群,不利于管理。不了解 FQDN 概念,请看博文[《一篇文章说清楚 TDengine 的 FQDN》](https://www.taosdata.com/blog/2020/09/11/1824.html)。