未验证 提交 ae552ca8 编写于 作者: 陶建辉(Jeff)'s avatar 陶建辉(Jeff) 提交者: GitHub

Update 02-concept.md

上级 1d0ac8ba
......@@ -2,9 +2,7 @@
title: Concepts
---
## A Typical Time-Series Data Scenario
In typical IoT, Connected Vehicles and IT Monitoring scenarios, there are often one or multiple types of data collection points that collect one or multiple metrics. However, for a data collection point type, there are often a number of specific points distributed in places. All the collected metric data are always time-stamped, and the volume of metric data grows with time, but each data collection point has its static attributes. For a specific type of data collection point, its collected data and static attributes are always structured. Taking power smart meter as an example, each smarter meter collects three metrics: current, voltage and phase. The collected data points are similar to the following table:
In order to explain the basic concepts and show sample code easily, the entire TDengine document takes smart meters as a typical time series data scenario. Assuming that each smart meter collects three metrics of current, voltage, and phase, there are multiple smart meters, and each meter has static attributes like location and group ID, the collected data shall be similar to the following table:
<div className="center-table">
<table>
......@@ -102,25 +100,7 @@ In typical IoT, Connected Vehicles and IT Monitoring scenarios, there are often
<a href="#model_table1">Table 1: Smart meter example data</a>
</div>
Each row contains the device ID, timestamp, collected metrics (current, voltage, phase as above), and static tags (Location and groupId in Table 1) associated with the devices. Each device generates a row (data point) in a pre-defined timer or triggered by an external event. It is a sequence of data points like a stream.
## Data Characteristics
The data points generated by IoT, Connected Vehicles, and IT Monitoring have some strong common characteristics:
1. Data points are always time stamped;
2. Metrics are always structured data;
3. There are rarely delete/update operations on collected data;
4. Unlike traditional databases, transaction processing is not required;
5. The ratio of reading over writing is much lower than typical Internet applications;
6. Data volume is stable and can be predicted according to the number of devices and sampling rate;
7. The user pays attention to the trend of data, not a specific data point at a specific time;
8. Data points are removed once they reach their life time (retention policy);
9. The query is always executed in a given time range and space;
10. Real-time computing or query is desired;
11. Data volume is huge, a system may generate over 10 billion data points in a day.
By utilizing the above characteristics, TDengine designs the storage and computing engine in a special and optimized way for time-series data, resulting in massive improvements in system efficiency.
Each row contains the device ID, timestamp, collected metrics (current, voltage, phase as above), and static tags (Location and groupId in Table 1) associated with the devices. Each smart meter generates a row (data point) in a pre-defined timer or triggered by an external event. It is a sequence of data points like a stream.
## Metric
......@@ -134,15 +114,11 @@ Label/Tag refers to the static properties of sensors, devices or other types of
Data Collection Point(DCP) refers to hardware or software that collects metrics based on preset time periods or triggered by events. A data collection point can collect one or multiple metrics, but these metrics are collected at the same time and have the same time stamp. For some complex equipments, there are often multiple data collection points, and the sampling rate of each collection point may be different, and fully independent. For example, for a car, there is a data collection point to collect GPS position metrics, a data collection point to collect engine status metrics, and a data collection point to collect the environment metrics inside the car, so a car has three data collection points.
## Relational Database Model
## Table
Since time-series data is most likely to be structured data, TDengine adopts the traditional relational database model to process them with a short learning curve. You need to create a database, create tables with schema definitions, then insert data points and execute queries to explore the data. Structured storage is used, instead of NoSQL’s key-value storage.
Since time-series data is most likely to be structured data, TDengine adopts the traditional relational database model to process them with a short learning curve. You need to create a database, create tables, then insert data points and execute queries to explore the data.
Compared with general database, TDengine adopts "one table for data collection point" strategy to enhance the data ingestion rate and query speed significantly. At the mean time, it introduces "Super Table" to allow each table have a set of labels to make aggregation across tables efficiently.
## One Table for One Data Collection Point
To utilize this time-series and other data features, TDengine requires the user to create a table for each data collection point (DCP) to store collected time-series data. For example, if there are over 10 million smart meters, it means 10 million tables shall be created. For the table above, 4 tables shall be created for devices D1001, D1002, D1003, and D1004 to store the data collected. This design has several benefits:
To make full use of time-series data characteristics, TDengine adopts a strategy "**One Table for One Data Collection Point**". TDengine requires the user to create a table for each data collection point (DCP) to store collected time-series data. For example, if there are over 10 million smart meters, it means 10 million tables shall be created. For the table above, 4 tables shall be created for devices D1001, D1002, D1003, and D1004 to store the data collected. This design has several benefits:
1. Since the metric data from different DCP is fully independent, the data source of each DCP is unique, and a table has only one writer. In this way, data points can be written in a lock-free manner, and the writing speed can be greatly improved.
2. For a DCP, the metric data generated by DCP is ordered by timestamp, so the write operation can be implemented by simple appending, which further greatly improves the data writing speed.
......@@ -153,7 +129,7 @@ If the metric data of multiple DPCs are traditionally written into a single tabl
TDengine suggests using DCP ID as the table name (like D1001 in the above table). Each DCP may collect one or multiple metrics (like the current, voltage, phase as above). Each metric has a corresponding column in the table. The data type for a column can be int, float, string and others. In addition, the first column in the table must be a timestamp. TDengine uses the time stamp as the index, and won’t build the index on any metrics stored. Column wise storage is used.
## STable: A Collection of Data Collection Points in the Same Type
## Super Table (STable)
The design of one table for one data collection point will require a huge number of tables, which is difficult to manage. Furthermore, applications often need to take aggregation operations among DCPs, thus aggregation operations will become complicated. To support aggregation over multiple tables efficiently, the STable(Super Table) concept is introduced by TDengine.
......@@ -167,6 +143,12 @@ In the design of TDengine, **a table is used to represent a specific data collec
Query can be executed on both table and STable. For a query on a STable, TDengine will treat the data in all belonged tables as a whole data set for processing. TDengine will first find out the tables that meet the tag filter conditions, then scan the time-series data of these tables to perform aggregation operation, which can greatly reduce the data sets to be scanned, thus greatly improving the performance of data aggregation across multiple DCPs.
## Database
A database is a collection of tables. TDengine allows a running instance to have multiple databases, and each database can be configured with different storage policies. Different types of DCPs often have different data characteristics, including the frequency of data collection, data retention time, the number of replications, the size of data blocks, whether data is allowed to be updated, and so on. In order for TDengine to work with maximum efficiency in various scenarios, TDengine recommends that STables with different data characteristics be created in different databases.
In a database, there can be one or more STables, but a STable belongs to only one database. All tables owned by a STable are stored in only one database.
## FQDN & End Point
FQDN (Fully Qualified Domain Name) is the full domain name of a specific computer or host on the Internet. FQDN consists of two parts: hostname and domain name. For example, the FQDN of a mail server might be mail.tdengine.com. The hostname is mail, and the host is located in the domain name tdengine.com. DNS (Domain Name System) is responsible for translating FQDN into IP. For systems without DNS, it can be solved by configuring the hosts file.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册