未验证 提交 45d7d109 编写于 作者: W wade zhang 提交者: GitHub

Merge pull request #12952 from taosdata/docs/cdiwadkar16-patch-4-2

docs: cdiwadkar16-patch-4 - Grammar + city names changed to San Jose …
......@@ -2,7 +2,7 @@
title: Concepts
---
In order to explain the basic concepts and provide some sample code, the TDengine documentation takes smart meters as a typical time series data scenario. Assuming that each smart meter collects three metrics of current, voltage, and phase, there are multiple smart meters, and each meter has static attributes like location and group ID, the collected data will be similar to the following table:
In order to explain the basic concepts and provide some sample code, the TDengine documentation smart meters as a typical time series use case. We assume the following: 1. Each smart meter collects three metrics i.e. current, voltage, and phase 2. There are multiple smart meters, and 3. Each meter has static attributes like location and group ID. Based on this, collected data will look similar to the following table:
<div className="center-table">
<table>
......@@ -29,7 +29,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
<td>10.3</td>
<td>219</td>
<td>0.31</td>
<td>Beijing.Chaoyang</td>
<td>San Jose</td>
<td>2</td>
</tr>
<tr>
......@@ -38,7 +38,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
<td>10.2</td>
<td>220</td>
<td>0.23</td>
<td>Beijing.Chaoyang</td>
<td>San Jose</td>
<td>3</td>
</tr>
<tr>
......@@ -47,7 +47,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
<td>11.5</td>
<td>221</td>
<td>0.35</td>
<td>Beijing.Haidian</td>
<td>Mountain View</td>
<td>3</td>
</tr>
<tr>
......@@ -56,7 +56,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
<td>13.4</td>
<td>223</td>
<td>0.29</td>
<td>Beijing.Haidian</td>
<td>Mountain View</td>
<td>2</td>
</tr>
<tr>
......@@ -65,7 +65,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
<td>12.6</td>
<td>218</td>
<td>0.33</td>
<td>Beijing.Chaoyang</td>
<td>San Jose</td>
<td>2</td>
</tr>
<tr>
......@@ -74,7 +74,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
<td>11.8</td>
<td>221</td>
<td>0.28</td>
<td>Beijing.Haidian</td>
<td>Mountain View</td>
<td>2</td>
</tr>
<tr>
......@@ -83,7 +83,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
<td>10.3</td>
<td>218</td>
<td>0.25</td>
<td>Beijing.Chaoyang</td>
<td>San Jose</td>
<td>3</td>
</tr>
<tr>
......@@ -92,7 +92,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
<td>12.3</td>
<td>221</td>
<td>0.31</td>
<td>Beijing.Chaoyang</td>
<td>San Jose</td>
<td>2</td>
</tr>
</tbody>
......@@ -112,7 +112,7 @@ Label/Tag refers to the static properties of sensors, equipment or other types o
## Data Collection Point
Data Collection Point (DCP) refers to hardware or software that collects metrics based on preset time periods or triggered by events. A data collection point can collect one or multiple metrics, but these metrics are collected at the same time and have the same time stamp. For some complex equipments, there are often multiple data collection points, and the sampling rate of each collection point may be different, and fully independent. For example, for a car, there could be a data collection point to collect GPS position metrics, a data collection point to collect engine status metrics, and a data collection point to collect the environment metrics inside the car, so in this example the car would have three data collection points.
Data Collection Point (DCP) refers to hardware or software that collects metrics based on preset time periods or triggered by events. A data collection point can collect one or multiple metrics, but these metrics are collected at the same time and have the same time stamp. For some complex equipment, there are often multiple data collection points, and the sampling rate of each collection point may be different, and fully independent. For example, for a car, there could be a data collection point to collect GPS position metrics, a data collection point to collect engine status metrics, and a data collection point to collect the environment metrics inside the car. So in this example the car would have three data collection points.
## Table
......@@ -122,10 +122,10 @@ To make full use of time-series data characteristics, TDengine adopts a strategy
1. Since the metric data from different DCP are fully independent, the data source of each DCP is unique, and a table has only one writer. In this way, data points can be written in a lock-free manner, and the writing speed can be greatly improved.
2. For a DCP, the metric data generated by DCP is ordered by timestamp, so the write operation can be implemented by simple appending, which further greatly improves the data writing speed.
3. The metric data from a DCP is continuously stored in block by block. If you read data for a period of time, it can greatly reduce random read operations and improve read and query performance by orders of magnitude.
4. Inside a data block for a DCP, columnar storage is used, and different compression algorithms are used for different data types. Metrics generally don't vary as significantly between themselves over a time range as compared to other metrics, this allows for a higher compression rate.
3. The metric data from a DCP is continuously stored, block by block. If you read data for a period of time, it can greatly reduce random read operations and improve read and query performance by orders of magnitude.
4. Inside a data block for a DCP, columnar storage is used, and different compression algorithms are used for different data types. Metrics generally don't vary as significantly between themselves over a time range as compared to other metrics, which allows for a higher compression rate.
If the metric data of multiple DCPs are traditionally written into a single table, due to the uncontrollable network delay, the timing of the data from different DCPs arriving at the server cannot be guaranteed, the writing operation must be protected by locks, and the metric data from one DCP cannot be guaranteed to be continuously stored together. **One table for one data collection point can ensure the best performance of insert and query of a single data collection point to the greatest extent.**
If the metric data of multiple DCPs are traditionally written into a single table, due to uncontrollable network delays, the timing of the data from different DCPs arriving at the server cannot be guaranteed, write operations must be protected by locks, and metric data from one DCP cannot be guaranteed to be continuously stored together. **One table for one data collection point can ensure the best performance of insert and query of a single data collection point to the greatest possible extent.**
TDengine suggests using DCP ID as the table name (like D1001 in the above table). Each DCP may collect one or multiple metrics (like the current, voltage, phase as above). Each metric has a corresponding column in the table. The data type for a column can be int, float, string and others. In addition, the first column in the table must be a timestamp. TDengine uses the time stamp as the index, and won’t build the index on any metrics stored. Column wise storage is used.
......@@ -139,7 +139,7 @@ In the design of TDengine, **a table is used to represent a specific data collec
## Subtable
When creating a table for a specific data collection point, the user can use a STable as a template and specifies the tag values of this specific DCP to create it. **The table created by using a STable as the template is called subtable** in TDengine. The difference between regular table and subtable is:
When creating a table for a specific data collection point, the user can use a STable as a template and specify the tag values of this specific DCP to create it. **The table created by using a STable as the template is called subtable** in TDengine. The difference between regular table and subtable is:
1. Subtable is a table, all SQL commands applied on a regular table can be applied on subtable.
2. Subtable is a table with extensions, it has static tags (labels), and these tags can be added, deleted, and updated after it is created. But a regular table does not have tags.
3. A subtable belongs to only one STable, but a STable may have many subtables. Regular tables do not belong to a STable.
......@@ -151,7 +151,7 @@ The relationship between a STable and the subtables created based on this STable
2. The schema of metrics or labels cannot be adjusted through subtables, and it can only be changed via STable. Changes to the schema of a STable takes effect immediately for all associated subtables.
3. STable defines only one template and does not store any data or label information by itself. Therefore, data cannot be written to a STable, only to subtables.
Queries can be executed on both a table (subtable) and a STable. For a query on a STable, TDengine will treat the data in all its subtables as a whole data set for processing. TDengine will first find the subtables that meet the tag filter conditions, then scan the time-series data of these subtables to perform aggregation operation, which can greatly reduce the data sets to be scanned, thus greatly improving the performance of data aggregation across multiple DCPs.
Queries can be executed on both a table (subtable) and a STable. For a query on a STable, TDengine will treat the data in all its subtables as a whole data set for processing. TDengine will first find the subtables that meet the tag filter conditions, then scan the time-series data of these subtables to perform aggregation operation, which reduces the number of data sets to be scanned which in turn greatly improves the performance of data aggregation across multiple DCPs.
In TDengine, it is recommended to use a subtable instead of a regular table for a DCP.
......@@ -167,4 +167,4 @@ FQDN (Fully Qualified Domain Name) is the full domain name of a specific compute
Each node of a TDengine cluster is uniquely identified by an End Point, which consists of an FQDN and a Port, such as h1.tdengine.com:6030. In this way, when the IP changes, we can still use the FQDN to dynamically find the node without changing any configuration of the cluster. In addition, FQDN is used to facilitate unified access to the same cluster from the Intranet and the Internet.
TDengine does not recommend using an IP address to access the cluster, FQDN is recommended for cluster management.
TDengine does not recommend using an IP address to access the cluster. FQDN is recommended for cluster management.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册