@@ -101,6 +101,12 @@ so you should run this command in the TDengine directory to install them:
...
@@ -101,6 +101,12 @@ so you should run this command in the TDengine directory to install them:
git submodule update --init--recursive
git submodule update --init--recursive
```
```
You can modify the file ~/.gitconfig to use ssh protocol instead of https for better download speed. You need to upload ssh public key to GitHub first. Please refer to GitHub official documentation for detail.
@@ -6,17 +6,16 @@ TDengine is an innovative Big Data processing product launched by TAOS Data in t
...
@@ -6,17 +6,16 @@ TDengine is an innovative Big Data processing product launched by TAOS Data in t
One of the modules of TDengine is the time-series database. However, in addition to this, to reduce the complexity of research and development and the difficulty of system operation, TDengine also provides functions such as caching, message queuing, subscription, stream computing, etc. TDengine provides a full-stack technical solution for the processing of IoT and Industrial Internet BigData. It is an efficient and easy-to-use IoT Big Data platform. Compared with typical Big Data platforms such as Hadoop, TDengine has the following distinct characteristics:
One of the modules of TDengine is the time-series database. However, in addition to this, to reduce the complexity of research and development and the difficulty of system operation, TDengine also provides functions such as caching, message queuing, subscription, stream computing, etc. TDengine provides a full-stack technical solution for the processing of IoT and Industrial Internet BigData. It is an efficient and easy-to-use IoT Big Data platform. Compared with typical Big Data platforms such as Hadoop, TDengine has the following distinct characteristics:
-**Performance improvement over 10 times**: An innovative data storage structure is defined, with each single core can process at least 20,000 requests per second, insert millions of data points, and read more than 10 million data points, which is more than 10 times faster than other existing general database.
-**Performance improvement over 10 times**: An innovative data storage structure is defined, with every single core that can process at least 20,000 requests per second, insert millions of data points, and read more than 10 million data points, which is more than 10 times faster than other existing general database.
-**Reduce the cost of hardware or cloud services to 1/5**: Due to its ultra-performance, TDengine’s computing resources consumption is less than 1/5 of other common Big Data solutions; through columnar storage and advanced compression algorithms, the storage consumption is less than 1/10 of other general databases.
-**Reduce the cost of hardware or cloud services to 1/5**: Due to its ultra-performance, TDengine’s computing resources consumption is less than 1/5 of other common Big Data solutions; through columnar storage and advanced compression algorithms, the storage consumption is less than 1/10 of other general databases.
-**Full-stack time-series data processing engine**: Integrate database, message queue, cache, stream computing, and other functions, and the applications do not need to integrate with software such as Kafka/Redis/HBase/Spark/HDFS, thus greatly reducing the complexity cost of application development and maintenance.
-**Full-stack time-series data processing engine**: Integrate database, message queue, cache, stream computing, and other functions, and the applications do not need to integrate with software such as Kafka/Redis/HBase/Spark/HDFS, thus greatly reducing the complexity cost of application development and maintenance.
-**Highly Available and Horizontal Scalable **: With the distributed architecture and consistency algorithm, via multi-replication and clustering features, TDengine ensures high availability and horizontal scalability to support the mission-critical applications.
-**Highly Available and Horizontal Scalable**: With the distributed architecture and consistency algorithm, via multi-replication and clustering features, TDengine ensures high availability and horizontal scalability to support mission-critical applications.
-**Zero operation cost & zero learning cost**: Installing clusters is simple and quick, with real-time backup built-in, and no need to split libraries or tables. Similar to standard SQL, TDengine can support RESTful, Python/Java/C/C++/C#/Go/Node.js, and similar to MySQL with zero learning cost.
-**Zero operation cost & zero learning cost**: Installing clusters is simple and quick, with real-time backup built-in, and no need to split libraries or tables. Similar to standard SQL, TDengine can support RESTful, Python/Java/C/C++/C#/Go/Node.js, and similar to MySQL with zero learning cost.
-**Core is Open Sourced:** Except some auxiliary features, the core of TDengine is open sourced. Enterprise won't be locked by the database anymore. Ecosystem is more strong, product is more stable, and developer communities are more active.
-**Core is Open Sourced:** Except for some auxiliary features, the core of TDengine is open-sourced. Enterprise won't be locked by the database anymore. The ecosystem is more strong, products are more stable, and developer communities are more active.
With TDengine, the total cost of ownership of typical IoT, Internet of Vehicles, and Industrial Internet Big Data platforms can be greatly reduced. However, since it makes full use of the characteristics of IoT time-series data, TDengine cannot be used to process general data from web crawlers, microblogs, WeChat, e-commerce, ERP, CRM, and other sources.
With TDengine, the total cost of ownership of typical IoT, Internet of Vehicles, and Industrial Internet Big Data platforms can be greatly reduced. However, since it makes full use of the characteristics of IoT time-series data, TDengine cannot be used to process general data from web crawlers, microblogs, WeChat, e-commerce, ERP, CRM, and other sources.
| Require system with high-reliability | | | √ | TDengine has a very robust and reliable system architecture to implement simple and convenient daily operation with streamlined experiences for operators, thus human errors and accidents are eliminated to the greatest extent. |
| Require system with high-reliability | | | √ | TDengine has a very robust and reliable system architecture to implement simple and convenient daily operation with streamlined experiences for operators, thus human errors and accidents are eliminated to the greatest extent. |
| Require abundant talent supply | √ | | | As a new-generation product, it’s still difficult to find talents with TDengine experiences from market. However, the learning cost is low. As the vendor, we also provide extensive operation training and counselling services. |
| Require abundant talent supply | √ | | | As a new-generation product, it’s still difficult to find talents with TDengine experiences from the market. However, the learning cost is low. As the vendor, we also provide extensive operation training and counseling services. |
A complete TDengine system runs on one or more physical nodes. Logically, it includes data node (dnode), TDEngine application driver (TAOSC) and application (app). There are one or more data nodes in the system, which form a cluster. The application interacts with the TDengine cluster through TAOSC's API. The following is a brief introduction to each logical unit.
A complete TDengine system runs on one or more physical nodes. Logically, it includes data node (dnode), TDEngine application driver (TAOSC) and application (app). There are one or more data nodes in the system, which form a cluster. The application interacts with the TDengine cluster through TAOSC's API. The following is a brief introduction to each logical unit.
...
@@ -199,8 +197,8 @@ A complete TDengine system runs on one or more physical nodes. Logically, it inc
...
@@ -199,8 +197,8 @@ A complete TDengine system runs on one or more physical nodes. Logically, it inc
To explain the relationship between vnode, mnode, TAOSC and application and their respective roles, the following is an analysis of a typical data writing process.
To explain the relationship between vnode, mnode, TAOSC and application and their respective roles, the following is an analysis of a typical data writing process.


<center>Picture 2 typical process of TDengine </center>
<center>Figure 2: Typical process of TDengine </center>
1. Application initiates a request to insert data through JDBC, ODBC, or other APIs.
1. Application initiates a request to insert data through JDBC, ODBC, or other APIs.
2. TAOSC checks if meta data existing for the table in the cache. If so, go straight to Step 4. If not, TAOSC sends a get meta-data request to mnode.
2. TAOSC checks if meta data existing for the table in the cache. If so, go straight to Step 4. If not, TAOSC sends a get meta-data request to mnode.
...
@@ -268,7 +266,8 @@ If a database has N replicas, thus a virtual node group has N virtual nodes, but
...
@@ -268,7 +266,8 @@ If a database has N replicas, thus a virtual node group has N virtual nodes, but
<center> Figure 3: TDengine Master writing process </center>
1. Master vnode receives the application data insertion request, verifies, and moves to next step;
1. Master vnode receives the application data insertion request, verifies, and moves to next step;
2. If the system configuration parameter `walLevel` is greater than 0, vnode will write the original request packet into database log file WAL. If walLevel is set to 2 and fsync is set to 0, TDengine will make WAL data written immediately to ensure that even system goes down, all data can be recovered from database log file;
2. If the system configuration parameter `walLevel` is greater than 0, vnode will write the original request packet into database log file WAL. If walLevel is set to 2 and fsync is set to 0, TDengine will make WAL data written immediately to ensure that even system goes down, all data can be recovered from database log file;
...
@@ -281,8 +280,8 @@ Figure 3: TDengine Master writing process
...
@@ -281,8 +280,8 @@ Figure 3: TDengine Master writing process
1. Slave vnode receives a data insertion request forwarded by Master vnode;
1. Slave vnode receives a data insertion request forwarded by Master vnode;
2. If the system configuration parameter `walLevel` is greater than 0, vnode will write the original request packet into database log file WAL. If walLevel is set to 2 and fsync is set to 0, TDengine will make WAL data written immediately to ensure that even system goes down, all data can be recovered from database log file;
2. If the system configuration parameter `walLevel` is greater than 0, vnode will write the original request packet into database log file WAL. If walLevel is set to 2 and fsync is set to 0, TDengine will make WAL data written immediately to ensure that even system goes down, all data can be recovered from database log file;
...
@@ -355,8 +354,6 @@ When data is written to disk, it is decided whether to compress the data accordi
...
@@ -355,8 +354,6 @@ When data is written to disk, it is decided whether to compress the data accordi
By default, TDengine saves all data in /var/lib/taos directory, and the data files of each vnode are saved in a different directory under this directory. In order to expand the storage space, minimize the bottleneck of file reading and improve the data throughput rate, TDengine can configure the system parameter “dataDir” to allow multiple mounted hard disks to be used by system at the same time. In addition, TDengine also provides the function of tiered data storage, i.e. storage on different storage media according to the time stamps of data files. For example, the latest data is stored on SSD, the data for more than one week is stored on local hard disk, and the data for more than four weeks is stored on network storage device, thus reducing the storage cost and ensuring efficient data access. The movement of data on different storage media is automatically done by the system and completely transparent to applications. Tiered storage of data is also configured through the system parameter “dataDir”.
By default, TDengine saves all data in /var/lib/taos directory, and the data files of each vnode are saved in a different directory under this directory. In order to expand the storage space, minimize the bottleneck of file reading and improve the data throughput rate, TDengine can configure the system parameter “dataDir” to allow multiple mounted hard disks to be used by system at the same time. In addition, TDengine also provides the function of tiered data storage, i.e. storage on different storage media according to the time stamps of data files. For example, the latest data is stored on SSD, the data for more than one week is stored on local hard disk, and the data for more than four weeks is stored on network storage device, thus reducing the storage cost and ensuring efficient data access. The movement of data on different storage media is automatically done by the system and completely transparent to applications. Tiered storage of data is also configured through the system parameter “dataDir”.
Where data_path is the folder path of mount point and tier_level is the media storage-tier. The higher the media storage-tier, means the older the data file. Multiple hard disks can be mounted at the same storage-tier, and data files on the same storage-tier are distributed on all hard disks within the tier. TDengine supports up to 3 tiers of storage, so tier_level values are 0, 1, and 2. When configuring dataDir, there must be only one mount path without specifying tier_level, which is called special mount disk (path). The mount path defaults to level 0 storage media and contains special file links, which cannot be removed, otherwise it will have a devastating impact on the written data.
Where data_path is the folder path of mount point and tier_level is the media storage-tier. The higher the media storage-tier, means the older the data file. Multiple hard disks can be mounted at the same storage-tier, and data files on the same storage-tier are distributed on all hard disks within the tier. TDengine supports up to 3 tiers of storage, so tier_level values are 0, 1, and 2. When configuring dataDir, there must be only one mount path without specifying tier_level, which is called special mount disk (path). The mount path defaults to level 0 storage media and contains special file links, which cannot be removed, otherwise it will have a devastating impact on the written data.
Suppose a physical node with six mountable hard disks/mnt/disk1,/mnt/disk2, …,/mnt/disk6, where disk1 and disk2 need to be designated as level 0 storage media, disk3 and disk4 are level 1 storage media, and disk5 and disk6 are level 2 storage media. Disk1 is a special mount disk, you can configure it in/etc/taos/taos.cfg as follows:
Suppose a physical node with six mountable hard disks/mnt/disk1,/mnt/disk2, …,/mnt/disk6, where disk1 and disk2 need to be designated as level 0 storage media, disk3 and disk4 are level 1 storage media, and disk5 and disk6 are level 2 storage media. Disk1 is a special mount disk, you can configure it in/etc/taos/taos.cfg as follows:
```
```
...
@@ -379,7 +374,6 @@ dataDir /mnt/disk6/taos 2
...
@@ -379,7 +374,6 @@ dataDir /mnt/disk6/taos 2
Mounted disks can also be a non-local network disk, as long as the system can access it.
Mounted disks can also be a non-local network disk, as long as the system can access it.
Note: Tiered Storage is only supported in Enterprise Edition
Note: Tiered Storage is only supported in Enterprise Edition
## <a class="anchor" id="query"></a>Data Query
## <a class="anchor" id="query"></a>Data Query
...
@@ -419,7 +413,7 @@ For the data collected by device D1001, the number of records per hour is counte
...
@@ -419,7 +413,7 @@ For the data collected by device D1001, the number of records per hour is counte
TDengine creates a separate table for each data collection point, but in practical applications, it is often necessary to aggregate data from different data collection points. In order to perform aggregation operations efficiently, TDengine introduces the concept of STable. STable is used to represent a specific type of data collection point. It is a table set containing multiple tables. The schema of each table in the set is the same, but each table has its own static tag. The tags can be multiple and be added, deleted and modified at any time. Applications can aggregate or statistically operate all or a subset of tables under a STABLE by specifying tag filters, thus greatly simplifying the development of applications. The process is shown in the following figure:
TDengine creates a separate table for each data collection point, but in practical applications, it is often necessary to aggregate data from different data collection points. In order to perform aggregation operations efficiently, TDengine introduces the concept of STable. STable is used to represent a specific type of data collection point. It is a table set containing multiple tables. The schema of each table in the set is the same, but each table has its own static tag. The tags can be multiple and be added, deleted and modified at any time. Applications can aggregate or statistically operate all or a subset of tables under a STABLE by specifying tag filters, thus greatly simplifying the development of applications. The process is shown in the following figure:


<center>Picture 4 Diagram of multi-table aggregation query </center>
<center>Figure 5: Diagram of multi-table aggregation query</center>
1. Application sends a query condition to system;
1. Application sends a query condition to system;
2. TAOSC sends the STable name to Meta Node(management node);
2. TAOSC sends the STable name to Meta Node(management node);
## <a class="anchor" id="queries"></a> Main Query Features
## <a class="anchor" id="queries"></a> Main Query Features
TDengine uses SQL as the query language. Applications can send SQL statements through C/C++, Java, Go, Python connectors, and users can manually execute SQL Ad-Hoc Query through the Command Line Interface (CLI) tool TAOS Shell provided by TDengine. TDengine supports the following query functions:
TDengine uses SQL as the query language. Applications can send SQL statements through C/C++, Java, Go, C#, Python, Node.js connectors, and users can manually execute SQL Ad-Hoc Query through the Command Line Interface (CLI) tool TAOS Shell provided by TDengine. TDengine supports the following query functions:
- Single-column and multi-column data query
- Single-column and multi-column data query
- Multiple filters for tags and numeric values: >, <,=,<>, like, etc
- Multiple filters for tags and numeric values: >, <,=,<>, like, etc
...
@@ -96,4 +96,4 @@ Query OK, 5 row(s) in set (0.001521s)
...
@@ -96,4 +96,4 @@ Query OK, 5 row(s) in set (0.001521s)
In IoT scenario, it is difficult to synchronize the time stamp of collected data at each point, but many analysis algorithms (such as FFT) need to align the collected data strictly at equal intervals of time. In many systems, it’s required to write their own programs to process, but the down sampling operation of TDengine can be used to solve the problem easily. If there is no collected data in an interval, TDengine also provides interpolation calculation function.
In IoT scenario, it is difficult to synchronize the time stamp of collected data at each point, but many analysis algorithms (such as FFT) need to align the collected data strictly at equal intervals of time. In many systems, it’s required to write their own programs to process, but the down sampling operation of TDengine can be used to solve the problem easily. If there is no collected data in an interval, TDengine also provides interpolation calculation function.
For details of syntax rules, please refer to the [Time-dimension Aggregation section of TAOS SQL](https://www.taosdata.com/en/documentation/taos-sql#aggregation).
For details of syntax rules, please refer to the [Time-dimension Aggregation section of TAOS SQL](https://www.taosdata.com/en/documentation/taos-sql#aggregation).
{"avro",'v',0,0,"Dump apache avro format data file. By default, dump sql command sequence.",2},
{"avro",'v',0,0,"Dump apache avro format data file. By default, dump sql command sequence.",2},
{"start-time",'S',"START_TIME",0,"Start time to dump. Either epoch or ISO8601/RFC3339 format is acceptable. ISO8601 format example: 2017-10-01T00:00:00.000+0800 or 2017-10-0100:00:00:000+0800 or '2017-10-01 00:00:00.000+0800'",4},
{"start-time",'S',"START_TIME",0,"Start time to dump. Either epoch or ISO8601/RFC3339 format is acceptable. ISO8601 format example: 2017-10-01T00:00:00.000+0800 or 2017-10-0100:00:00:000+0800 or '2017-10-01 00:00:00.000+0800'",4},
{"end-time",'E',"END_TIME",0,"End time to dump. Either epoch or ISO8601/RFC3339 format is acceptable. ISO8601 format example: 2017-10-01T00:00:00.000+0800 or 2017-10-0100:00:00.000+0800 or '2017-10-01 00:00:00.000+0800'",5},
{"end-time",'E',"END_TIME",0,"End time to dump. Either epoch or ISO8601/RFC3339 format is acceptable. ISO8601 format example: 2017-10-01T00:00:00.000+0800 or 2017-10-0100:00:00.000+0800 or '2017-10-01 00:00:00.000+0800'",5},
#if TSDB_SUPPORT_NANOSECOND == 1
{"precision",'C',"PRECISION",0,"Specify precision for converting human-readable time to epoch. Valid value is one of ms, us, and ns. Default is ms.",6},
#else
{"precision",'C',"PRECISION",0,"Use specified precision to convert human-readable time. Valid value is one of ms and us. Default is ms.",6},
#endif
{"data-batch",'B',"DATA_BATCH",0,"Number of data point per insert statement. Max value is 32766. Default is 1.",3},
{"data-batch",'B',"DATA_BATCH",0,"Number of data point per insert statement. Max value is 32766. Default is 1.",3},
{"max-sql-len",'L',"SQL_LEN",0,"Max length of one sql. Default is 65480.",3},
{"max-sql-len",'L',"SQL_LEN",0,"Max length of one sql. Default is 65480.",3},
{"table-batch",'t',"TABLE_BATCH",0,"Number of table dumpout into one output file. Default is 1.",3},
{"table-batch",'t',"TABLE_BATCH",0,"Number of table dumpout into one output file. Default is 1.",3},
...
@@ -281,8 +277,11 @@ typedef struct arguments {
...
@@ -281,8 +277,11 @@ typedef struct arguments {
boolwith_property;
boolwith_property;
boolavro;
boolavro;
int64_tstart_time;
int64_tstart_time;
charhumanStartTime[28];
int64_tend_time;
int64_tend_time;
charhumanEndTime[28];
charprecision[8];
charprecision[8];
int32_tdata_batch;
int32_tdata_batch;
int32_tmax_sql_len;
int32_tmax_sql_len;
int32_ttable_batch;// num of table which will be dump into one output file.
int32_ttable_batch;// num of table which will be dump into one output file.
...
@@ -296,6 +295,8 @@ typedef struct arguments {
...
@@ -296,6 +295,8 @@ typedef struct arguments {
booldebug_print;
booldebug_print;
boolverbose_print;
boolverbose_print;
boolperformance_print;
boolperformance_print;
intdbCount;
}SArguments;
}SArguments;
/* Our argp parser. */
/* Our argp parser. */
...
@@ -318,13 +319,17 @@ static void taosDumpCreateTableClause(STableDef *tableDes, int numOfCols,
...
@@ -318,13 +319,17 @@ static void taosDumpCreateTableClause(STableDef *tableDes, int numOfCols,