提交 c1a7b9fe 编写于 作者: E Elias Soong

[TD-4669] <docs>: fix some typo in English documents.

上级 e5495968
......@@ -6,7 +6,7 @@ TDengine is a highly efficient platform to store, query, and analyze time-series
* [TDengine Introduction and Features](/evaluation#intro)
* [TDengine Use Scenes](/evaluation#scenes)
* [TDengine Performance Metrics and Verification]((/evaluation#))
* [TDengine Performance Metrics and Verification](/evaluation#)
## [Getting Started](/getting-started)
......@@ -20,9 +20,9 @@ TDengine is a highly efficient platform to store, query, and analyze time-series
## [Overall Architecture](/architecture)
- [Data Model](/architecture#model): relational database model, but one table for one data collection point with static tags
- [Cluster and Primary Logical Unit](/architecture#cluster): Take advantage of NoSQ architecture, highly available and horizontal scalable
- [Cluster and Primary Logical Unit](/architecture#cluster): Take advantage of NoSQL architecture, high availability and horizontal scalability
- [Storage Model and Data Partitioning/Sharding](/architecture#sharding): tag data is separated from time-series data, sharded by vnodes and partitioned by time
- [Data Writing and Replication Process](/architecture#replication): records received are written to WAL, cached, with acknowledgement sent back to client, while supporting data replications.
- [Data Writing and Replication Process](/architecture#replication): records received are written to WAL, cached, with acknowledgement sent back to client, while supporting data replications
- [Caching and Persistence](/architecture#persistence): latest records are cached in memory, but are written in columnar format with an ultra-high compression ratio
- [Data Query](/architecture#query): support various SQL functions, downsampling, interpolation, and multi-table aggregation
......@@ -70,7 +70,7 @@ TDengine is a highly efficient platform to store, query, and analyze time-series
## [Connector](/connector)
- [C/C++ Connector](/connector#c-cpp): primary method to connect to TDengine server through libtaos client library
- [Java Connector(JDBC)](https://www.taosdata.com/en/documentation20/connector/java): driver for connecting to the server from Java applications using the JDBC API
- [Java Connector(JDBC)](/connector/java): driver for connecting to the server from Java applications using the JDBC API
- [Python Connector](/connector#python): driver for connecting to TDengine server from Python applications
- [RESTful Connector](/connector#restful): a simple way to interact with TDengine via HTTP
- [Go Connector](/connector#go): driver for connecting to TDengine server from Go applications
......
......@@ -9,7 +9,7 @@ One of the modules of TDengine is the time-series database. However, in addition
- **Performance improvement over 10 times**: An innovative data storage structure is defined, with each single core can process at least 20,000 requests per second, insert millions of data points, and read more than 10 million data points, which is more than 10 times faster than other existing general database.
- **Reduce the cost of hardware or cloud services to 1/5**: Due to its ultra-performance, TDengine’s computing resources consumption is less than 1/5 of other common Big Data solutions; through columnar storage and advanced compression algorithms, the storage consumption is less than 1/10 of other general databases.
- **Full-stack time-series data processing engine**: Integrate database, message queue, cache, stream computing, and other functions, and the applications do not need to integrate with software such as Kafka/Redis/HBase/Spark/HDFS, thus greatly reducing the complexity cost of application development and maintenance.
- **Highly Available and Horizontal Scalabe **: With the distributed architecture and consistency algorithm, via multi-replication and clustering features, TDengine ensures high availability and horizontal scalability to support the mission-critical applications.
- **Highly Available and Horizontal Scalable **: With the distributed architecture and consistency algorithm, via multi-replication and clustering features, TDengine ensures high availability and horizontal scalability to support the mission-critical applications.
- **Zero operation cost & zero learning cost**: Installing clusters is simple and quick, with real-time backup built-in, and no need to split libraries or tables. Similar to standard SQL, TDengine can support RESTful, Python/Java/C/C++/C#/Go/Node.js, and similar to MySQL with zero learning cost.
- **Core is Open Sourced:** Except some auxiliary features, the core of TDengine is open sourced. Enterprise won't be locked by the database anymore. Ecosystem is more strong, product is more stable, and developer communities are more active.
......
......@@ -10,7 +10,9 @@ Please visit our [TDengine github page](https://github.com/taosdata/TDengine) fo
### Install from Docker Container
Please visit our [TDengine Official Docker Image: Distribution, Downloading, and Usage](https://www.taosdata.com/blog/2020/05/13/1509.html).
For the time being, it is not recommended to use Docker to deploy the client or server side of TDengine in production environments, but it is convenient to use Docker to deploy in development environments or when trying it for the first time. In particular, with Docker, it is easy to try TDengine in Mac OS X and Windows environments.
Please refer to the detailed operation in [Quickly experience TDengine through Docker](https://www.taosdata.com/en/documentation/getting-started/docker).
### <a class="anchor" id="package-install"></a>Install from Package
......
......@@ -243,7 +243,7 @@ The meda data of each table (including schema, tags, etc.) is also stored in vno
### Data Partitioning
In addition to vnode sharding, TDengine partitions the time-series data by time range. Each data file contains only one time range of time-series data, and the length of the time range is determined by DB's configuration parameter `days`”. This method of partitioning by time rang is also convenient to efficiently implement the data retention policy. As long as the data file exceeds the specified number of days (system configuration parameter "`keep`"), it will be automatically deleted. Moreover, different time ranges can be stored in different paths and storage media, so as to facilitate the tiered-storage. Cold/hot data can be stored in different storage meida to reduce the storage cost.
In addition to vnode sharding, TDengine partitions the time-series data by time range. Each data file contains only one time range of time-series data, and the length of the time range is determined by DB's configuration parameter `days`. This method of partitioning by time rang is also convenient to efficiently implement the data retention policy. As long as the data file exceeds the specified number of days (system configuration parameter `keep`), it will be automatically deleted. Moreover, different time ranges can be stored in different paths and storage media, so as to facilitate the tiered-storage. Cold/hot data can be stored in different storage meida to reduce the storage cost.
In general, **TDengine splits big data by vnode and time range in two dimensions** to manage the data efficiently with horizontal scalability.
......@@ -251,7 +251,7 @@ In general, **TDengine splits big data by vnode and time range in two dimensions
Each dnode regularly reports its status (including hard disk space, memory size, CPU, network, number of virtual nodes, etc.) to the mnode (virtual management node), so mnode knows the status of the entire cluster. Based on the overall status, when the mnode finds a dnode is overloaded, it will migrate one or more vnodes to other dnodes. During the process, TDengine services keep running and the data insertion, query and computing operations are not affected.
If the mnode has not received the dnode status for a period of time, the dnode will be treated as offline. When offline lasts a certain period of time (configured by parameter `offlineThreshold`), the dnode will be forcibly removed from the cluster by mnode. If the number of replicas of vnodes on this dnode is greater than one, the system will automatically create new replicas on other dnodes to ensure the replica number. If there are other mnodes on this dnode and the number of mnodes replicas is greater than one, the system will automatically create new mnodes on other dnodes to ensure the replica number.
If the mnode has not received the dnode status for a period of time, the dnode will be treated as offline. When offline lasts a certain period of time (configured by parameter `offlineThreshold`), the dnode will be forcibly removed from the cluster by mnode. If the number of replicas of vnodes on this dnode is greater than one, the system will automatically create new replicas on other dnodes to ensure the replica number. If there are other mnodes on this dnode and the number of mnodes replicas is greater than one, the system will automatically create new mnodes on other dnodes to ensure the replica number.
When new data nodes are added to the cluster, with new computing and storage resources are added, the system will automatically start the load balancing process.
......@@ -268,7 +268,7 @@ Master Vnode uses a writing process as follows:
Figure 3: TDengine Master writing process
1. Master vnode receives the application data insertion request, verifies, and moves to next step;
2. If the system configuration parameter `walLevel` is greater than 0, vnode will write the original request packet into database log file WAL. If walLevel is set to 2 and fsync is set to 0, TDengine will make WAL data written immediately to ensure that even system goes down, all data can be recovered from database log file;
2. If the system configuration parameter `walLevel` is greater than 0, vnode will write the original request packet into database log file WAL. If walLevel is set to 2 and fsync is set to 0, TDengine will make WAL data written immediately to ensure that even system goes down, all data can be recovered from database log file;
3. If there are multiple replicas, vnode will forward data packet to slave vnodes in the same virtual node group, and the forwarded packet has a version number with data;
4. Write into memory and add the record to “skip list”;
5. Master vnode returns a confirmation message to the application, indicating a successful writing.
......@@ -282,7 +282,7 @@ For a slave vnode, the write process as follows:
<center> Picture 3 TDengine Slave Writing Process </center>
1. Slave vnode receives a data insertion request forwarded by Master vnode.
2. If the system configuration parameter “walLevel” is greater than 0, vnode will write the original request packet into database log file WAL. If walLevel is set to 2 and fsync is set to 0, TDengine will make WAL data written immediately to ensure that even system goes down, all data can be recovered from database log file;
2. If the system configuration parameter `walLevel` is greater than 0, vnode will write the original request packet into database log file WAL. If walLevel is set to 2 and fsync is set to 0, TDengine will make WAL data written immediately to ensure that even system goes down, all data can be recovered from database log file;
3. Write into memory and add the record to “skip list”;
Compared with Master vnode, slave vnode has no forwarding or reply confirmation step, means two steps less. But writing into memory and WAL is exactly the same.
......@@ -336,17 +336,17 @@ Each vnode has its own independent memory, and it is composed of multiple memory
TDengine uses a data-driven method to write the data from buffer into hard disk for persistent storage. When the cached data in vnode reaches a certain volume, TDengine will also pull up the disk-writing thread to write the cached data into persistent storage in order not to block subsequent data writing. TDengine will open a new database log file when the data is written, and delete the old database log file after written successfully to avoid unlimited log growth.
To make full use of the characteristics of time-series data, TDengine splits the data stored in persistent storage by a vnode into multiple files, each file only saves data for a fixed number of days, which is determined by the system configuration parameter `days`. By so, for the given start and end date of a query, you can locate the data files to open immediately without any index, thus greatly speeding up reading operations.
To make full use of the characteristics of time-series data, TDengine splits the data stored in persistent storage by a vnode into multiple files, each file only saves data for a fixed number of days, which is determined by the system configuration parameter `days`. By so, for the given start and end date of a query, you can locate the data files to open immediately without any index, thus greatly speeding up reading operations.
For time-series data, there is generally a retention policy, which is determined by the system configuration parameter `keep`. Data files exceeding this set number of days will be automatically deleted by the system to free up storage space.
For time-series data, there is generally a retention policy, which is determined by the system configuration parameter `keep`. Data files exceeding this set number of days will be automatically deleted by the system to free up storage space.
Given “days” and “keep” parameters, the total number of data files in a vnode is: keep/days. The total number of data files should not be too large or too small. 10 to 100 is appropriate. Based on this principle, reasonable days can be set. In the current version, parameter “keep” can be modified, but parameter “days” cannot be modified once it is set.
In each data file, the data of a table is stored by blocks. A table can have one or more data file blocks. In a file block, data is stored in columns, occupying a continuous storage space, thus greatly improving the reading speed. The size of file block is determined by the system parameter `maxRows` (the maximum number of records per block), and the default value is 4096. This value should not be too large or too small. If it is too large, the data locating in search will cost longer; if too small, the index of data block is too large, and the compression efficiency will be low with slower reading speed.
In each data file, the data of a table is stored by blocks. A table can have one or more data file blocks. In a file block, data is stored in columns, occupying a continuous storage space, thus greatly improving the reading speed. The size of file block is determined by the system parameter `maxRows` (the maximum number of records per block), and the default value is 4096. This value should not be too large or too small. If it is too large, the data locating in search will cost longer; if too small, the index of data block is too large, and the compression efficiency will be low with slower reading speed.
Each data file (with a .data postfix) has a corresponding index file (with a .head postfix). The index file has summary information of a data block for each table, recording the offset of each data block in the data file, start and end time of data and other information, so as to lead system quickly locate the data to be found. Each data file also has a corresponding last file (with a .last postfix), which is designed to prevent data block fragmentation when written in disk. If the number of written records from a table does not reach the system configuration parameter `minRows` (minimum number of records per block), it will be stored in the last file first. When write to disk next time, the newly written records will be merged with the records in last file and then written into data file.
Each data file (with a .data postfix) has a corresponding index file (with a .head postfix). The index file has summary information of a data block for each table, recording the offset of each data block in the data file, start and end time of data and other information, so as to lead system quickly locate the data to be found. Each data file also has a corresponding last file (with a .last postfix), which is designed to prevent data block fragmentation when written in disk. If the number of written records from a table does not reach the system configuration parameter `minRows` (minimum number of records per block), it will be stored in the last file first. When write to disk next time, the newly written records will be merged with the records in last file and then written into data file.
When data is written to disk, it is decided whether to compress the data according to system configuration parameter `comp`. TDengine provides three compression options: no compression, one-stage compression and two-stage compression, corresponding to comp values of 0, 1 and 2 respectively. One-stage compression is carried out according to the type of data. Compression algorithms include delta-delta coding, simple 8B method, zig-zag coding, LZ4 and other algorithms. Two-stage compression is based on one-stage compression and compressed by general compression algorithm, which has higher compression ratio.
When data is written to disk, it is decided whether to compress the data according to system configuration parameter `comp`. TDengine provides three compression options: no compression, one-stage compression and two-stage compression, corresponding to comp values of 0, 1 and 2 respectively. One-stage compression is carried out according to the type of data. Compression algorithms include delta-delta coding, simple 8B method, zig-zag coding, LZ4 and other algorithms. Two-stage compression is based on one-stage compression and compressed by general compression algorithm, which has higher compression ratio.
### Tiered Storage
......@@ -395,7 +395,7 @@ When client obtains query result, the worker thread in query execution queue of
The remarkable feature that time-series data is different from ordinary data is that each record has a timestamp, so aggregating data with timestamps on the time axis is an important and distinct feature from common databases. From this point of view, it is similar to the window query of stream computing engine.
The keyword `interval` is introduced into TDengine to split fixed length time windows on time axis, and the data are aggregated based on time windows, and the data within window range are aggregated as needed. For example:
The keyword `interval` is introduced into TDengine to split fixed length time windows on time axis, and the data are aggregated based on time windows, and the data within window range are aggregated as needed. For example:
```mysql
select count(*) from d1001 interval(1h);
......
......@@ -296,9 +296,7 @@ Asynchronous APIs have relatively high requirements for users, who can selective
The asynchronous APIs of TDengine all use non-blocking calling mode. Applications can use multithreading to open multiple tables at the same time, and can query or insert to each open table at the same time. It should be pointed out that the **application client must ensure that the operation on the same table is completely serialized**, that is, when the insertion or query operation on the same table is not completed (when no result returned), the second insertion or query operation cannot be performed.
<a class="anchor" id="stmt"></a>
### Parameter binding API
In addition to calling `taos_query` directly for queries, TDengine also provides a Prepare API that supports parameter binding. Like MySQL, these APIs currently only support using question mark `?` to represent the parameters to be bound, as follows:
......@@ -823,12 +821,12 @@ https://www.taosdata.com/blog/2020/11/02/1901.html
The TDengine provides the GO driver taosSql. taosSql implements the GO language's built-in interface database/sql/driver. Users can access TDengine in the application by simply importing the package as follows, see https://github.com/taosdata/driver-go/blob/develop/taosSql/driver_test.go for details.
Sample code for using the Go connector can be found in https://github.com/taosdata/TDengine/tree/develop/tests/examples/go and the [video tutorial](https://www.taosdata.com/blog/2020/11/11/1951.html).
Sample code for using the Go connector can be found in https://github.com/taosdata/TDengine/tree/develop/tests/examples/go .
```Go
import (
"database/sql"
_ "github.com/taosdata/driver-go/taosSql"
_ "github.com/taosdata/driver-go/v2/taosSql"
)
```
......@@ -839,6 +837,8 @@ go env -w GO111MODULE=on
go env -w GOPROXY=https://goproxy.io,direct
```
`taosSql` v2 completed refactoring of the v1 version and separated the built-in database operation interface `database/sql/driver` to the directory `taosSql`, and put other advanced functions such as subscription and stmt into the directory `af`.
### Common APIs
- `sql.Open(DRIVER_NAME string, dataSourceName string) *DB`
......@@ -937,7 +937,7 @@ After installing the TDengine client, the nodejsChecker.js program can verify wh
Steps:
1. Create a new installation verification directory, for example: ~/tdengine-test, copy the nodejsChecker.js source program on github. Download address: (https://github.com/taosdata/TDengine/tree/develop/tests/examples/nodejs/nodejsChecker.js).
1. Create a new installation verification directory, for example: `~/tdengine-test`, copy the nodejsChecker.js source program on github. Download address: (https://github.com/taosdata/TDengine/tree/develop/tests/examples/nodejs/nodejsChecker.js).
2. Execute the following command:
......
......@@ -16,7 +16,7 @@ Please refer to the [video tutorial](https://www.taosdata.com/blog/2020/11/11/19
**Note 1:** Because the information of FQDN will be written into a file, if FQDN has not been configured or changed before, and TDengine has been started, be sure to clean up the previous data (`rm -rf /var/lib/taos/*`)on the premise of ensuring that the data is useless or backed up;
**Note 2:** The client also needs to be configured to ensure that it can correctly parse the FQDN configuration of each node, whether through DNS service or Host file.
**Note 2:** The client also needs to be configured to ensure that it can correctly parse the FQDN configuration of each node, whether through DNS service or modify hosts file.
**Step 2:** It is recommended to close the firewall of all physical nodes, and at least ensure that the TCP and UDP ports of ports 6030-6042 are open. It is **strongly recommended** to close the firewall first and configure the ports after the cluster is built;
......
......@@ -78,7 +78,7 @@ When the nodes in TDengine cluster are deployed on different physical machines a
## <a class="anchor" id="config"></a> Server-side Configuration
The background service of TDengine system is provided by taosd, and the configuration parameters can be modified in the configuration file taos.cfg to meet the requirements of different scenarios. The default location of the configuration file is the /etc/taos directory, which can be specified by executing the parameter `-c` from the taosd command line. Such as `taosd -c /home/user,` to specify that the configuration file is located in the /home/user directory.
The background service of TDengine system is provided by taosd, and the configuration parameters can be modified in the configuration file taos.cfg to meet the requirements of different scenarios. The default location of the configuration file is the /etc/taos directory, which can be specified by executing the parameter `-c` from the taosd command line. Such as `taosd -c /home/user`, to specify that the configuration file is located in the /home/user directory.
You can also use “-C” to show the current server configuration parameters:
......
......@@ -1173,7 +1173,7 @@ TDengine supports aggregations over data, they are listed below:
## <a class="anchor" id="aggregation"></a> Time-dimension Aggregation
TDengine supports aggregating by intervals(time range). Data in a table can partitioned by intervals and aggregated to generate results. For example, a temperature sensor collects data once per second, but the average temperature needs to be queried every 10 minutes. This aggregation is suitable for down sample operation, and the syntax is as follows:
TDengine supports aggregating by intervals (time range). Data in a table can partitioned by intervals and aggregated to generate results. For example, a temperature sensor collects data once per second, but the average temperature needs to be queried every 10 minutes. This aggregation is suitable for down sample operation, and the syntax is as follows:
```mysql
SELECT function_list FROM tb_name
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册