diff --git a/docs-en/02-concept/02-concept.md b/docs-en/02-concept/02-concept.md index 22784fb6463fd820c433a9fd7fde326762b6f55b..a270cfcdb6a08a0f98d0ab9f468572196aaff1ff 100644 --- a/docs-en/02-concept/02-concept.md +++ b/docs-en/02-concept/02-concept.md @@ -4,102 +4,103 @@ title: Concepts ## A Typical Time-Series Data Scenario -In typical IoT, Connected Vehicles and IT Monitoring scenarios, there are often one or multiple types of data collection points that collect one or multiple metrics. However, for a data collection point type, there are often a number of specific points distributed in places. All the collected metric data are always time-stamped, and the volume of metric data grows with time, but each data collection point has its static attributes. For a speicific type of data collection point, its collected data and static attributes are always structured. Taking power smart meter as an example, each smater meter collects three metrics: current, voltage and phase. The collected data points are similiar to the following table: +In typical IoT, Connected Vehicles and IT Monitoring scenarios, there are often one or multiple types of data collection points that collect one or multiple metrics. However, for a data collection point type, there are often a number of specific points distributed in places. All the collected metric data are always time-stamped, and the volume of metric data grows with time, but each data collection point has its static attributes. For a specific type of data collection point, its collected data and static attributes are always structured. Taking power smart meter as an example, each smarter meter collects three metrics: current, voltage and phase. The collected data points are similar to the following table: -
+
+
- - - - + + + + - - - - - - - + + + + + + + - - - - - - - + + + + + + + - - - - - - - + + + + + + + - - - - - - - + + + + + + + - - - - - - - + + + + + + + - - - - - - - + + + + + + + - - - - - - - + + + + + + + - - - - - - - + + + + + + + - - - - - - - + + + + + + + -
Device IDTime StampCollected MetricsTagsDevice IDTime StampCollected MetricsTags
Device IDTime StampcurrentvoltagephaselocationgroupIdDevice IDTime StampcurrentvoltagephaselocationgroupId
d1001153854868500010.32190.31Beijing.Chaoyang2d1001153854868500010.32190.31Beijing.Chaoyang2
d1002153854868400010.22200.23Beijing.Chaoyang3d1002153854868400010.22200.23Beijing.Chaoyang3
d1003153854868650011.52210.35Beijing.Haidian3d1003153854868650011.52210.35Beijing.Haidian3
d1004153854868550013.42230.29Beijing.Haidian2d1004153854868550013.42230.29Beijing.Haidian2
d1001153854869500012.62180.33Beijing.Chaoyang2d1001153854869500012.62180.33Beijing.Chaoyang2
d1004153854869660011.82210.28Beijing.Haidian2d1004153854869660011.82210.28Beijing.Haidian2
d1002153854869665010.32180.25Beijing.Chaoyang3d1002153854869665010.32180.25Beijing.Chaoyang3
d1001153854869680012.32210.31Beijing.Chaoyang2d1001153854869680012.32210.31Beijing.Chaoyang2
- -
Table 1: Smart meter example data
+ +Table 1: Smart meter example data + Each row contains the device ID, timestamp, collected metrics (current, voltage, phase as above), and static tags (Location and groupId in Table 1) associated with the devices. Each device generates a row (data point) in a pre-defined timer or triggered by an external event. It is a sequence of data points like a stream. @@ -127,9 +128,9 @@ Metric refers to the physical quantity collected by sensors, equipment or other ## Label/Tag -Label/Tage refers to the static properties of sensors, devices or other types of data collection devices, which do not change with time, such as device model, color, fixed location of the device, etc. The data type can be any type. Although static, TDengine allows users to add, delete or update tag values. Unlike the collected metric data, the amount of tag data stored does not change over time. +Label/Tag refers to the static properties of sensors, devices or other types of data collection devices, which do not change with time, such as device model, color, fixed location of the device, etc. The data type can be any type. Although static, TDengine allows users to add, delete or update tag values. Unlike the collected metric data, the amount of tag data stored does not change over time. -## Data Colletion Point +## Data Collection Point Data Collection Point(DCP) refers to hardware or software that collects metrics based on preset time periods or triggered by events. A data collection point can collect one or multiple metrics, but these metrics are collected at the same time and have the same time stamp. For some complex equipments, there are often multiple data collection points, and the sampling rate of each collection point may be different, and fully independent. For example, for a car, there is a data collection point to collect GPS position metrics, a data collection point to collect engine status metrics, and a data collection point to collect the environment metrics inside the car, so a car has three data collection points. @@ -137,7 +138,7 @@ Data Collection Point(DCP) refers to hardware or software that collects metrics Since time-series data is most likely to be structured data, TDengine adopts the traditional relational database model to process them with a short learning curve. You need to create a database, create tables with schema definitions, then insert data points and execute queries to explore the data. Structured storage is used, instead of NoSQL’s key-value storage. -Compared with general database, TDegine adopts "one table for data collection point" strategy to enhance the data ingestion rate and query speed significantly. At the mean time, it introduces "Super Table" to allow each table have a set of labels to make aggregation across tables efficiently. +Compared with general database, TDengine adopts "one table for data collection point" strategy to enhance the data ingestion rate and query speed significantly. At the mean time, it introduces "Super Table" to allow each table have a set of labels to make aggregation across tables efficiently. ## One Table for One Data Collection Point @@ -150,7 +151,7 @@ To utilize this time-series and other data features, TDengine requires the user If the metric data of multiple DPCs are traditionally written into a single table, due to the uncontrollable network delay, the timing of the data from different DCPs arriving at the server cannot be guaranteed, the writing operation must be protected by locks, and the metric data from one DCP cannot be guaranteed to be continuously stored together. **One table for one data collection point can ensure the best performance of insert and query of a single data collection point to the greatest extent.** -TDengine suggests using DCP ID as the table name (like D1001 in the above table). Each DCP may collect one or multiple metrics (like the current, voltage, phase as above). Each metric has a corresponding column in the table. The data type for a column can be int, float, string and others. In addition, the first column in the table must be a timestamp. TDengine uses the time stamp as the index, and won’t build the index on any metrics stored. Column wise storage is used. +TDengine suggests using DCP ID as the table name (like D1001 in the above table). Each DCP may collect one or multiple metrics (like the current, voltage, phase as above). Each metric has a corresponding column in the table. The data type for a column can be int, float, string and others. In addition, the first column in the table must be a timestamp. TDengine uses the time stamp as the index, and won’t build the index on any metrics stored. Column wise storage is used. ## STable: A Collection of Data Collection Points in the Same Type @@ -170,6 +171,6 @@ Query can be executed on both table and STable. For a query on a STable, TDengin FQDN (Fully Qualified Domain Name) is the full domain name of a specific computer or host on the Internet. FQDN consists of two parts: hostname and domain name. For example, the FQDN of a mail server might be mail.tdengine.com. The hostname is mail, and the host is located in the domain name tdengine.com. DNS (Domain Name System) is responsible for translating FQDN into IP. For systems without DNS, it can be solved by configuring the hosts file. -Each node of a TDengine cluster is uniquely identified by an End Point, which consists of an FQDN and a Port, such as h1.tdengine.com:6030. In this way, when the IP changes, we can still use the FQDN to dynamically find the node without changing any configuration of the cluster. In addition, FQDN is used to facilitate unified access to the same cluster from the Intranet and the Inetnet. +Each node of a TDengine cluster is uniquely identified by an End Point, which consists of an FQDN and a Port, such as h1.tdengine.com:6030. In this way, when the IP changes, we can still use the FQDN to dynamically find the node without changing any configuration of the cluster. In addition, FQDN is used to facilitate unified access to the same cluster from the Intranet and the Internet. -TDengine does not recommend using IP address to access the cluster, which is not good for cluster management. +TDengine does not recommend using IP address to access the cluster, which is not good for cluster management.