提交 e95d63c7 编写于 作者: G gccgdb1234

docs: remove 2.6 new functions from 2.4 document in EN

上级 820451b7
---
title: TDengine Documentation
sidebar_label: Documentation Home
slug: /
---
TDengine is a [high-performance](https://tdengine.com/fast), [scalable](https://tdengine.com/scalable) time series database with [SQL support](https://tdengine.com/sql-support). This document is the TDengine user manual. It introduces the basic concepts, installation, features, SQL, APIs, operation, maintenance, kernel design, etc. It’s written mainly for architects, developers and system administrators.
To get a global view about TDengine, like feature list, benchmarks, and competitive advantages, please browse through section [Introduction](./intro).
TDengine makes full use of the characteristics of time series data, proposes the concepts of "one table for one data collection point" and "super table", and designs an innovative storage engine, which greatly improves the efficiency of data ingestion, querying and storage. To understand the new concepts and use TDengine in the right way, please read [“Concepts”](./concept) thoroughly.
If you are a developer, please read the [“Developer Guide”](./develop) carefully. This section introduces the database connection, data modeling, data ingestion, query, continuous query, cache, data subscription, user-defined function, etc. in detail. Sample code is provided for a variety of programming languages. In most cases, you can just copy and paste the sample code, make a few changes to accommodate your application, and it will work.
We live in the era of big data, and scale-up is unable to meet the growing business needs. Any modern data system must have the ability to scale out, and clustering has become an indispensable feature of big data systems. The TDengine team has not only developed the cluster feature, they also decided to open source this important feature. To learn how to deploy, manage and maintain a TDengine cluster please refer to ["Cluster"](./cluster).
TDengine uses SQL as its query language, which greatly reduces learning costs and migration costs. In addition to the standard SQL, TDengine has extensions to support time series data scenarios better, such as roll up, interpolation, time weighted average, etc. The ["SQL Reference"](./taos-sql) chapter describes the SQL syntax in detail, and lists the various supported commands and functions.
If you are a system administrator who cares about installation, upgrade, fault tolerance, disaster recovery, data import, data export, system configuration, how to monitor whether TDengine is running healthily, and how to improve system performance, please refer to the ["Administration"](./operation) thoroughly.
If you want to know more about TDengine tools, REST API, and connectors for various programming languages, please see the ["Reference"](./reference) chapter.
If you are very interested in the internal design of TDengine, please read the chapter ["Inside TDengine”](./tdinternal), which introduces the cluster design, data partitioning, sharding, writing, and reading processes in detail. If you want to study TDengine code or even contribute code, please read this chapter carefully.
TDengine is an open source database, you are welcome to be a part of TDengine. If you find any errors in the documentation, or the description is not clear, please click "Edit this page" at the bottom of each page to edit it directly.
Together, we make a difference.
---
title: Introduction
toc_max_heading_level: 2
---
TDengine is a high-performance, scalable time-series database with SQL support. Its code, including its cluster feature is open source under GNU AGPL v3.0. Besides the database engine, it provides [caching](/develop/cache), [stream processing](/develop/continuous-query), [data subscription](/develop/subscribe) and other functionalities to reduce the complexity and cost of development and operation.
This section introduces the major features, competitive advantages, suited scenarios and benchmarks to help you get a high level picture for TDengine.
## Major Features
The major features are listed below:
1. Besides [using SQL to insert](/develop/insert-data/sql-writing),it supports [Schemaless writing](/reference/schemaless/),and it supports [InfluxDB LINE](/develop/insert-data/influxdb-line)[OpenTSDB Telnet](/develop/insert-data/opentsdb-telnet), [OpenTSDB JSON ](/develop/insert-data/opentsdb-json) and other protocols.
2. Support for seamless integration with third-party data collection agents like [Telegraf](/third-party/telegraf)[Prometheus](/third-party/prometheus)[StatsD](/third-party/statsd)[collectd](/third-party/collectd)[icinga2](/third-party/icinga2), [TCollector](/third-party/tcollector), [EMQX](/third-party/emq-broker), [HiveMQ](/third-party/hive-mq-broker). Without a line of code, those agents can write data points into TDengine just by configuration.
3. Support for [all kinds of queries](/develop/query-data), including aggregation, nested query, downsampling, interpolation, etc.
4. Support for [user defined functions](/develop/udf)
5. Support for [caching](/develop/cache). TDengine always saves the last data point in cache, so Redis is not needed in some scenarios.
6. Support for [continuous query](/develop/continuous-query).
7. Support for [data subscription](/develop/subscribe) with the capability to specify filter conditions.
8. Support for [cluster](/cluster/), with the capability of increasing processing power by adding more nodes. High availability is supported by replication.
9. Provides interactive [command-line interface](/reference/taos-shell) for management, maintenance and ad-hoc query.
10. Provides many ways to [import](/operation/import) and [export](/operation/export) data.
11. Provides [monitoring](/operation/monitor) on TDengine running instances.
12. Provides [connectors](/reference/connector/) for [C/C++](/reference/connector/cpp), [Java](/reference/connector/java), [Python](/reference/connector/python), [Go](/reference/connector/go), [Rust](/reference/connector/rust), [Node.js](/reference/connector/node) and other programming languages.
13. Provides a [REST API](/reference/rest-api/).
14. Supports the seamless integration with [Grafana](/third-party/grafana) for visualization.
15. Supports seamless integration with Google Data Studio.
For more detail on features, please read through the whole documentation.
## Competitive Advantages
TDengine makes full use of [the characteristics of time series data](https://tdengine.com/2019/07/09/86.html), such as structured, no transaction, rarely delete or update, etc., and builds its own innovative storage engine and computing engine to differentiate itself from other time series databases with the following advantages.
- **[High Performance](https://tdengine.com/fast)**: TDengine outperforms other time series databases in data ingestion and querying while significantly reducing storage cost and compute costs, with an innovatively designed and purpose-built storage engine.
- **[Scalable](https://tdengine.com/scalable)**: TDengine provides out-of-box scalability and high-availability through its native distributed design. Nodes can be added through simple configuration to achieve greater data processing power. In addition, this feature is open source.
- **[SQL Support](https://tdengine.com/sql-support)**: TDengine uses SQL as the query language, thereby reducing learning and migration costs, while adding SQL extensions to handle time-series data better, and supporting convenient and flexible schemaless data ingestion.
- **All in One**: TDengine has built-in caching, stream processing and data subscription functions. It is no longer necessary to integrate Kafka/Redis/HBase/Spark or other software in some scenarios. It makes the system architecture much simpler, cost-effective and easier to maintain.
- **Seamless Integration**: Without a single line of code, TDengine provide seamless, configurable integration with third-party tools such as Telegraf, Grafana, EMQX, Prometheus, StatsD, collectd, etc. More third-party tools are being integrated.
- **Zero Management**: Installation and cluster setup can be done in seconds. Data partitioning and sharding are executed automatically. TDengine’s running status can be monitored via Grafana or other DevOps tools.
- **Zero Learning Costs**: With SQL as the query language and support for ubiquitous tools like Python, Java, C/C++, Go, Rust, and Node.js connectors, there are zero learning costs.
- **Interactive Console**: TDengine provides convenient console access to the database to run ad hoc queries, maintain the database, or manage the cluster without any programming.
With TDengine, the total cost of ownership of time-series data platform can be greatly reduced. Because 1: with its superior performance, the computing and storage resources are reduced significantly; 2:with SQL support, it can be seamlessly integrated with many third party tools, and learning costs/migration costs are reduced significantly; 3: with its simple architecture and zero management, the operation and maintenance costs are reduced.
## Technical Ecosystem
In the time-series data processing platform, TDengine stands in a role like this diagram below:
![TDengine Technical Ecosystem ](eco_system.png)
<center>Figure 1. TDengine Technical Ecosystem</center>
On the left side, there are data collection agents like OPC-UA, MQTT, Telegraf and Kafka. On the right side, visualization/BI tools, HMI, Python/R, and IoT Apps can be connected. TDengine itself provides interactive command-line interface and web interface for management and maintenance.
## Suited Scenarios
As a high-performance, scalable and SQL supported time-series database, TDengine's typical application scenarios include but are not limited to IoT, Industrial Internet, Connected Vehicles, IT operation and maintenance, energy, financial markets and other fields. TDengine is a purpose-built database optimized for the characteristics of time series data, it cannot be used to process data from web crawlers, social media, e-commerce, ERP, CRM, etc. This section makes a more detailed analysis of the applicable scenarios.
### Characteristics and Requirements of Data Sources
| **Data Source Characteristics and Requirements** | **Not Applicable** | **Might Be Applicable** | **Very Applicable** | **Description** |
| -------------------------------------------------------- | ------------------ | ----------------------- | ------------------- | :----------------------------------------------------------- |
| A massive amount of total data | | | √ | TDengine provides excellent scale-out functions in terms of capacity, and has a storage structure with matching high compression ratio to achieve the best storage efficiency in the industry.|
| Data input velocity is extremely high | | | √ | TDengine's performance is much higher than that of other similar products. It can continuously process larger amounts of input data in the same hardware environment, and provides a performance evaluation tool that can easily run in the user environment. |
| A huge number of data sources | | | √ | TDengine is optimized specifically for a huge number of data sources. It is especially suitable for efficiently ingesting, writing and querying data from billions of data sources. |
### System Architecture Requirements
| **System Architecture Requirements** | **Not Applicable** | **Might Be Applicable** | **Very Applicable** | **Description** |
| ------------------------------------------------- | ------------------ | ----------------------- | ------------------- | ------------------------------------------------------------ |
| A simple and reliable system architecture | | | √ | TDengine's system architecture is very simple and reliable, with its own message queue, cache, stream computing, monitoring and other functions. There is no need to integrate any additional third-party products. |
| Fault-tolerance and high-reliability | | | √ | TDengine has cluster functions to automatically provide high-reliability and high-availability functions such as fault tolerance and disaster recovery. |
| Standardization support | | | √ | TDengine supports standard SQL and provides SQL extensions for time-series data analysis. |
### System Function Requirements
| **System Function Requirements** | **Not Applicable** | **Might Be Applicable** | **Very Applicable** | **Description** |
| ------------------------------------------------- | ------------------ | ----------------------- | ------------------- | ------------------------------------------------------------ |
| Complete data processing algorithms built-in | | √ | | While TDengine implements various general data processing algorithms, industry specific algorithms and special types of processing will need to be implemented at the application level.|
| A large number of crosstab queries | | √ | | This type of processing is better handled by general purpose relational database systems but TDengine can work in concert with relational database systems to provide more complete solutions. |
### System Performance Requirements
| **System Performance Requirements** | **Not Applicable** | **Might Be Applicable** | **Very Applicable** | **Description** |
| ------------------------------------------------- | ------------------ | ----------------------- | ------------------- | ------------------------------------------------------------ |
| Very large total processing capacity | | | √ | TDengine’s cluster functions can easily improve processing capacity via multi-server coordination. |
| Extremely high-speed data processing | | | √ | TDengine’s storage and data processing are optimized for IoT, and can process data many times faster than similar products.|
| Extremely fast processing of high resolution data | | | √ | TDengine has achieved the same or better performance than other relational and NoSQL data processing systems. |
### System Maintenance Requirements
| **System Maintenance Requirements** | **Not Applicable** | **Might Be Applicable** | **Very Applicable** | **Description** |
| ------------------------------------------------- | ------------------ | ----------------------- | ------------------- | ------------------------------------------------------------ |
| Native high-reliability | | | √ | TDengine has a very robust, reliable and easily configurable system architecture to simplify routine operation. Human errors and accidents are eliminated to the greatest extent, with a streamlined experience for operators. |
| Minimize learning and maintenance costs | | | √ | In addition to being easily configurable, standard SQL support and the Taos shell for ad hoc queries makes maintenance simpler, allows reuse and reduces learning costs.|
| Abundant talent supply | √ | | | Given the above, and given the extensive training and professional services provided by TDengine, it is easy to migrate from existing solutions or create a new and lasting solution based on TDengine.|
## Comparison with other databases
- [Writing Performance Comparison of TDengine and InfluxDB ](https://tdengine.com/2022/02/23/4975.html)
- [Query Performance Comparison of TDengine and InfluxDB](https://tdengine.com/2022/02/24/5120.html)
- [TDengine vs InfluxDB、OpenTSDB、Cassandra、MySQL、ClickHouse](https://www.tdengine.com/downloads/TDengine_Testing_Report_en.pdf)
- [TDengine vs OpenTSDB](https://tdengine.com/2019/09/12/710.html)
- [TDengine vs Cassandra](https://tdengine.com/2019/09/12/708.html)
- [TDengine vs InfluxDB](https://tdengine.com/2019/09/12/706.html)
label: Concepts
\ No newline at end of file
---
title: Concepts
---
In order to explain the basic concepts and provide some sample code, the TDengine documentation takes smart meters as a typical time series data scenario. Assuming that each smart meter collects three metrics of current, voltage, and phase, there are multiple smart meters, and each meter has static attributes like location and group ID, the collected data will be similar to the following table:
<div className="center-table">
<table>
<thead><tr>
<th>Device ID</th>
<th>Time Stamp</th>
<th colSpan="3">Collected Metrics</th>
<th colSpan="2">Tags</th>
</tr>
<tr>
<th>Device ID</th>
<th>Time Stamp</th>
<th>current</th>
<th>voltage</th>
<th>phase</th>
<th>location</th>
<th>groupId</th>
</tr>
</thead>
<tbody>
<tr>
<td>d1001</td>
<td>1538548685000</td>
<td>10.3</td>
<td>219</td>
<td>0.31</td>
<td>Beijing.Chaoyang</td>
<td>2</td>
</tr>
<tr>
<td>d1002</td>
<td>1538548684000</td>
<td>10.2</td>
<td>220</td>
<td>0.23</td>
<td>Beijing.Chaoyang</td>
<td>3</td>
</tr>
<tr>
<td>d1003</td>
<td>1538548686500</td>
<td>11.5</td>
<td>221</td>
<td>0.35</td>
<td>Beijing.Haidian</td>
<td>3</td>
</tr>
<tr>
<td>d1004</td>
<td>1538548685500</td>
<td>13.4</td>
<td>223</td>
<td>0.29</td>
<td>Beijing.Haidian</td>
<td>2</td>
</tr>
<tr>
<td>d1001</td>
<td>1538548695000</td>
<td>12.6</td>
<td>218</td>
<td>0.33</td>
<td>Beijing.Chaoyang</td>
<td>2</td>
</tr>
<tr>
<td>d1004</td>
<td>1538548696600</td>
<td>11.8</td>
<td>221</td>
<td>0.28</td>
<td>Beijing.Haidian</td>
<td>2</td>
</tr>
<tr>
<td>d1002</td>
<td>1538548696650</td>
<td>10.3</td>
<td>218</td>
<td>0.25</td>
<td>Beijing.Chaoyang</td>
<td>3</td>
</tr>
<tr>
<td>d1001</td>
<td>1538548696800</td>
<td>12.3</td>
<td>221</td>
<td>0.31</td>
<td>Beijing.Chaoyang</td>
<td>2</td>
</tr>
</tbody>
</table>
<a href="#model_table1">Table 1: Smart meter example data</a>
</div>
Each row contains the device ID, time stamp, collected metrics (current, voltage, phase as above), and static tags (location and groupId in Table 1) associated with the devices. Each smart meter generates a row (measurement) in a pre-defined time interval or triggered by an external event. The device produces a sequence of measurements with associated time stamps.
## Metric
Metric refers to the physical quantity collected by sensors, equipment or other types of data collection devices, such as current, voltage, temperature, pressure, GPS position, etc., which change with time, and the data type can be integer, float, Boolean, or strings. As time goes by, the amount of collected metric data stored increases.
## Label/Tag
Label/Tag refers to the static properties of sensors, equipment or other types of data collection devices, which do not change with time, such as device model, color, fixed location of the device, etc. The data type can be any type. Although static, TDengine allows users to add, delete or update tag values at any time. Unlike the collected metric data, the amount of tag data stored does not change over time.
## Data Collection Point
Data Collection Point (DCP) refers to hardware or software that collects metrics based on preset time periods or triggered by events. A data collection point can collect one or multiple metrics, but these metrics are collected at the same time and have the same time stamp. For some complex equipments, there are often multiple data collection points, and the sampling rate of each collection point may be different, and fully independent. For example, for a car, there could be a data collection point to collect GPS position metrics, a data collection point to collect engine status metrics, and a data collection point to collect the environment metrics inside the car, so in this example the car would have three data collection points.
## Table
Since time-series data is most likely to be structured data, TDengine adopts the traditional relational database model to process them with a short learning curve. You need to create a database, create tables, then insert data points and execute queries to explore the data.
To make full use of time-series data characteristics, TDengine adopts a strategy of "**One Table for One Data Collection Point**". TDengine requires the user to create a table for each data collection point (DCP) to store collected time-series data. For example, if there are over 10 million smart meters, it means 10 million tables should be created. For the table above, 4 tables should be created for devices D1001, D1002, D1003, and D1004 to store the data collected. This design has several benefits:
1. Since the metric data from different DCP are fully independent, the data source of each DCP is unique, and a table has only one writer. In this way, data points can be written in a lock-free manner, and the writing speed can be greatly improved.
2. For a DCP, the metric data generated by DCP is ordered by timestamp, so the write operation can be implemented by simple appending, which further greatly improves the data writing speed.
3. The metric data from a DCP is continuously stored in block by block. If you read data for a period of time, it can greatly reduce random read operations and improve read and query performance by orders of magnitude.
4. Inside a data block for a DCP, columnar storage is used, and different compression algorithms are used for different data types. Metrics generally don't vary as significantly between themselves over a time range as compared to other metrics, this allows for a higher compression rate.
If the metric data of multiple DCPs are traditionally written into a single table, due to the uncontrollable network delay, the timing of the data from different DCPs arriving at the server cannot be guaranteed, the writing operation must be protected by locks, and the metric data from one DCP cannot be guaranteed to be continuously stored together. **One table for one data collection point can ensure the best performance of insert and query of a single data collection point to the greatest extent.**
TDengine suggests using DCP ID as the table name (like D1001 in the above table). Each DCP may collect one or multiple metrics (like the current, voltage, phase as above). Each metric has a corresponding column in the table. The data type for a column can be int, float, string and others. In addition, the first column in the table must be a timestamp. TDengine uses the time stamp as the index, and won’t build the index on any metrics stored. Column wise storage is used.
## Super Table (STable)
The design of one table for one data collection point will require a huge number of tables, which is difficult to manage. Furthermore, applications often need to take aggregation operations among DCPs, thus aggregation operations will become complicated. To support aggregation over multiple tables efficiently, the STable(Super Table) concept is introduced by TDengine.
STable is a template for a type of data collection point. A STable contains a set of data collection points (tables) that have the same schema or data structure, but with different static attributes (tags). To describe a STable, in addition to defining the table structure of the metrics, it is also necessary to define the schema of its tags. The data type of tags can be int, float, string, and there can be multiple tags, which can be added, deleted, or modified afterward. If the whole system has N different types of data collection points, N STables need to be established.
In the design of TDengine, **a table is used to represent a specific data collection point, and STable is used to represent a set of data collection points of the same type**.
## Subtable
When creating a table for a specific data collection point, the user can use a STable as a template and specifies the tag values of this specific DCP to create it. **The table created by using a STable as the template is called subtable** in TDengine. The difference between regular table and subtable is:
1. Subtable is a table, all SQL commands applied on a regular table can be applied on subtable.
2. Subtable is a table with extensions, it has static tags (labels), and these tags can be added, deleted, and updated after it is created. But a regular table does not have tags.
3. A subtable belongs to only one STable, but a STable may have many subtables. Regular tables do not belong to a STable.
4. A regular table can not be converted into a subtable, and vice versa.
The relationship between a STable and the subtables created based on this STable is as follows:
1. A STable contains multiple subtables with the same metric schema but with different tag values.
2. The schema of metrics or labels cannot be adjusted through subtables, and it can only be changed via STable. Changes to the schema of a STable takes effect immediately for all associated subtables.
3. STable defines only one template and does not store any data or label information by itself. Therefore, data cannot be written to a STable, only to subtables.
Queries can be executed on both a table (subtable) and a STable. For a query on a STable, TDengine will treat the data in all its subtables as a whole data set for processing. TDengine will first find the subtables that meet the tag filter conditions, then scan the time-series data of these subtables to perform aggregation operation, which can greatly reduce the data sets to be scanned, thus greatly improving the performance of data aggregation across multiple DCPs.
In TDengine, it is recommended to use a subtable instead of a regular table for a DCP.
## Database
A database is a collection of tables. TDengine allows a running instance to have multiple databases, and each database can be configured with different storage policies. Different types of DCPs often have different data characteristics, including the frequency of data collection, data retention time, the number of replications, the size of data blocks, whether data is allowed to be updated, and so on. In order for TDengine to work with maximum efficiency in various scenarios, TDengine recommends that STables with different data characteristics be created in different databases.
In a database, there can be one or more STables, but a STable belongs to only one database. All tables owned by a STable are stored in only one database.
## FQDN & End Point
FQDN (Fully Qualified Domain Name) is the full domain name of a specific computer or host on the Internet. FQDN consists of two parts: hostname and domain name. For example, the FQDN of a mail server might be mail.tdengine.com. The hostname is mail, and the host is located in the domain name tdengine.com. DNS (Domain Name System) is responsible for translating FQDN into IP. For systems without DNS, it can be solved by configuring the hosts file.
Each node of a TDengine cluster is uniquely identified by an End Point, which consists of an FQDN and a Port, such as h1.tdengine.com:6030. In this way, when the IP changes, we can still use the FQDN to dynamically find the node without changing any configuration of the cluster. In addition, FQDN is used to facilitate unified access to the same cluster from the Intranet and the Internet.
TDengine does not recommend using an IP address to access the cluster, FQDN is recommended for cluster management.
`apt-get` can be used to install TDengine from official package repository.
**Package Repository**
```
wget -qO - http://repos.taosdata.com/tdengine.key | sudo apt-key add -
echo "deb [arch=amd64] http://repos.taosdata.com/tdengine-stable stable main" | sudo tee /etc/apt/sources.list.d/tdengine-stable.list
```
The repository required for installing beta versions can be configured as below:
```
echo "deb [arch=amd64] http://repos.taosdata.com/tdengine-beta beta main" | sudo tee /etc/apt/sources.list.d/tdengine-beta.list
```
**Install With apt-get**
```
sudo apt-get update
apt-cache policy tdengine
sudo apt-get install tdengine
```
:::tip
`apt-get` can only be used on Debian or Ubuntu Linux.
::::
import PkgList from "/components/PkgList";
It's very easy to install TDengine and would take you only a few minutes from downloading to finishing installation.
For the convenience of users, from version 2.4.0.10, the standard server side installation package includes `taos`, `taosd`, `taosAdapter`, `taosBenchmark` and sample code. If only the `taosd` server and C/C++ connector are required, you can also choose to download the lite package.
Three kinds of packages are provided, tar.gz, rpm and deb. Especially the tar.gz package is provided for the convenience of enterprise customers on different kinds of operating systems, it includes `taosdump` and TDinsight installation script which are normally only provided in taos-tools rpm and deb packages.
Between two major release versions, some beta versions may be delivered for users to try some new features.
<PkgList type={0}/>
For the details please refer to [Install and Uninstall](/operation/pkg-install)。
To see the details of versions, please refer to [Download List](https://www.taosdata.com/all-downloads) and [Release Notes](https://github.com/taosdata/TDengine/releases).
---
title: Get Started
description: 'Install TDengine from Docker image, apt-get or package, and run TAOS CLI and taosBenchmark to experience the features'
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import PkgInstall from "./\_pkg_install.mdx";
import AptGetInstall from "./\_apt_get_install.mdx";
## Quick Install
The full package of TDengine includes the server(taosd), taosAdapter for connecting with third-party systems and providing a RESTful interface, client driver(taosc), command-line program(CLI, taos) and some tools. For the current version, the server taosd and taosAdapter can only be installed and run on Linux systems. In the future taosd and taosAdapter will also be supported on Windows, macOS and other systems. The client driver taosc and TDengine CLI can be installed and run on Windows or Linux. In addition to the connectors of multiple languages, [RESTful interface](/reference/rest-api) is also provided by [taosAdapter](/reference/taosadapter) in TDengine. Prior to version 2.4.0.0, however, there is no taosAdapter, the RESTful interface is provided by the built-in HTTP service of taosd.
TDengine supports X64/ARM64/MIPS64/Alpha64 hardware platforms, and will support ARM32, RISC-V and other CPU architectures in the future.
<Tabs defaultValue="apt-get">
<TabItem value="docker" label="Docker">
If docker is already installed on your computer, execute the following command:
```shell
docker run -d -p 6030-6049:6030-6049 -p 6030-6049:6030-6049/udp tdengine/tdengine
```
Make sure the container is running
```shell
docker ps
```
Enter into container and execute bash
```shell
docker exec -it <container name> bash
```
Then you can execute the Linux commands and access TDengine.
For detailed steps, please visit [Experience TDengine via Docker](/train-faq/docker)
:::info
Starting from 2.4.0.10,besides taosd,TDengine docker image includes: taos,taosAdapter,taosdump,taosBenchmark,TDinsight, scripts and sample code. Once the TDengine container is started,it will start both taosAdapter and taosd automatically to support RESTful interface.
:::
</TabItem>
<TabItem value="apt-get" label="apt-get">
<AptGetInstall />
</TabItem>
<TabItem value="pkg" label="Package">
<PkgInstall />
</TabItem>
<TabItem value="src" label="Source Code">
If you like to check the source code, build the package by yourself or contribute to the project, please check [TDengine GitHub Repository](https://github.com/taosdata/TDengine)
</TabItem>
</Tabs>
## Quick Launch
After installation, you can launch the TDengine service by the 'systemctl' command to start 'taosd'.
```bash
systemctl start taosd
```
Check if taosd is running:
```bash
systemctl status taosd
```
If everything is fine, you can run TDengine command-line interface `taos` to access TDengine and test it out yourself.
:::info
- systemctl requires _root_ privileges,if you are not _root_ ,please add sudo before the command.
- To get feedback and keep improving the product, TDengine is collecting some basic usage information, but you can turn it off by setting telemetryReporting to 0 in configuration file taos.cfg.
- TDengine uses FQDN (usually hostname)as the ID for a node. To make the system work, you need to configure the FQDN for the server running taosd, and configure the DNS service or hosts file on the the machine where the application or TDengine CLI runs to ensure that the FQDN can be resolved.
- `systemctl stop taosd` won't stop the server right away, it will wait until all the data in memory are flushed to disk. It may takes time depending on the cache size.
TDengine supports the installation on system which runs [`systemd`](https://en.wikipedia.org/wiki/Systemd) for process management,use `which systemctl` to check if the system has `systemd` installed:
```bash
which systemctl
```
If the system does not have `systemd`,you can start TDengine manually by executing `/usr/local/taos/bin/taosd`
:::note
## Command Line Interface
To manage the TDengine running instance,or execute ad-hoc queries, TDengine provides a Command Line Interface (hereinafter referred to as TDengine CLI) taos. To enter into the interactive CLI,execute `taos` on a Linux terminal where TDengine is installed.
```bash
taos
```
If it connects to the TDengine server successfully, it will print out the version and welcome message. If it fails, it will print out the error message, please check [FAQ](/train-faq/faq) for trouble shooting connection issue. TDengine CLI's prompt is:
```cmd
taos>
```
Inside TDengine CLI,you can execute SQL commands to create/drop database/table, and run queries. The SQL command must be ended with a semicolon. For example:
```sql
create database demo;
use demo;
create table t (ts timestamp, speed int);
insert into t values ('2019-07-15 00:00:00', 10);
insert into t values ('2019-07-15 01:00:00', 20);
select * from t;
ts | speed |
========================================
2019-07-15 00:00:00.000 | 10 |
2019-07-15 01:00:00.000 | 20 |
Query OK, 2 row(s) in set (0.003128s)
```
Besides executing SQL commands, system administrators can check running status, add/drop user accounts and manage the running instances. TAOS CLI with client driver can be installed and run on either Linux or Windows machines. For more details on CLI, please [check here](../reference/taos-shell/).
## Experience the blazing fast speed
After TDengine server is running,execute `taosBenchmark` (previously named taosdemo) from a Linux terminal:
```bash
taosBenchmark
```
This command will create a super table "meters" under database "test". Under "meters", 10000 tables are created with names from "d0" to "d9999". Each table has 10000 rows and each row has four columns (ts, current, voltage, phase). Time stamp is starting from "2017-07-14 10:40:00 000" to "2017-07-14 10:40:09 999". Each table has tags "location" and "groupId". groupId is set 1 to 10 randomly, and location is set to "beijing" or "shanghai".
This command will insert 100 million rows into the database quickly. Time to insert depends on the hardware configuration, it only takes a dozen seconds for a regular PC server.
taosBenchmark provides command-line options and a configuration file to customize the scenarios, like number of tables, number of rows per table, number of columns and more. Please execute `taosBenchmark --help` to list them. For details on running taosBenchmark, please check [reference for taosBenchmark](/reference/taosbenchmark)
## Experience query speed
After using taosBenchmark to insert a number of rows data, you can execute queries from TDengine CLI to experience the lightning fast query speed.
query the total number of rows under super table "meters":
```sql
taos> select count(*) from test.meters;
```
query the average, maximum, minimum of 100 million rows:
```sql
taos> select avg(current), max(voltage), min(phase) from test.meters;
```
query the total number of rows with location="beijing":
```sql
taos> select count(*) from test.meters where location="beijing";
```
query the average, maximum, minimum of all rows with groupId=10:
```sql
taos> select avg(current), max(voltage), min(phase) from test.meters where groupId=10;
```
query the average, maximum, minimum for table d10 in 10 seconds time interval:
```sql
taos> select avg(current), max(voltage), min(phase) from test.d10 interval(10s);
```
```c title="Native Connection"
{{#include docs-examples/c/connect_example.c}}
```
```csharp title="Native Connection"
{{#include docs-examples/csharp/ConnectExample.cs}}
```
:::info
C# connector supports only native connection for now.
:::
#### Unified Database Access Interface
```go title="Native Connection"
{{#include docs-examples/go/connect/cgoexample/main.go}}
```
```go title="REST Connection"
{{#include docs-examples/go/connect/restexample/main.go}}
```
#### Advanced Features
The af package of driver-go can also be used to establish connection, with this way some advanced features of TDengine, like parameter binding and subscription, can be used.
```go title="Establish native connection using af package"
{{#include docs-examples/go/connect/afconn/main.go}}
```
```java title="Native Connection"
{{#include docs-examples/java/src/main/java/com/taos/example/JNIConnectExample.java}}
```
```java title="REST Connection"
{{#include docs-examples/java/src/main/java/com/taos/example/RESTConnectExample.java:main}}
```
When using REST connection, the feature of bulk pulling can be enabled if the size of resulting data set is huge.
```java title="Enable Bulk Pulling" {4}
{{#include docs-examples/java/src/main/java/com/taos/example/WSConnectExample.java:main}}
```
More configuration about connection,please refer to [Java Connector](/reference/connector/java)
```js title="Native Connection"
{{#include docs-examples/node/nativeexample/connect.js}}
```
```js title="REST Connection"
{{#include docs-examples/node/restexample/connect.js}}
```
```python title="Native Connection"
{{#include docs-examples/python/connect_example.py}}
```
```r title="Native Connection"
{{#include docs-examples/R/connect_native.r:demo}}
```
```rust title="Native Connection/REST Connection"
{{#include docs-examples/rust/nativeexample/examples/connect.rs}}
```
:::note
For Rust connector, the connection depends on the feature being used. If "rest" feature is enabled, then only the implementation for "rest" is compiled and packaged.
:::
---
sidebar_label: Connection
title: Connect to TDengine
description: "This document explains how to establish connection to TDengine, and briefly introduce how to install and use TDengine connectors."
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import ConnJava from "./\_connect_java.mdx";
import ConnGo from "./\_connect_go.mdx";
import ConnRust from "./\_connect_rust.mdx";
import ConnNode from "./\_connect_node.mdx";
import ConnPythonNative from "./\_connect_python.mdx";
import ConnCSNative from "./\_connect_cs.mdx";
import ConnC from "./\_connect_c.mdx";
import ConnR from "./\_connect_r.mdx";
import InstallOnWindows from "../../14-reference/03-connector/\_linux_install.mdx";
import InstallOnLinux from "../../14-reference/03-connector/\_windows_install.mdx";
import VerifyLinux from "../../14-reference/03-connector/\_verify_linux.mdx";
import VerifyWindows from "../../14-reference/03-connector/\_verify_windows.mdx";
Any application programs running on any kind of platforms can access TDengine through the REST API provided by TDengine. For the details, please refer to [REST API](/reference/rest-api/). Besides, application programs can use the connectors of multiple programming languages to access TDengine, including C/C++, Java, Python, Go, Node.js, C#, and Rust. This chapter describes how to establish connection to TDengine and briefly introduce how to install and use connectors. For details about the connectors, please refer to [Connectors](/reference/connector/)
## Establish Connection
There are two ways for a connector to establish connections to TDengine:
1. Connection through the REST API provided by taosAdapter component, this way is called "REST connection" hereinafter.
2. Connection through the TDengine client driver (taosc), this way is called "Native connection" hereinafter.
Either way, same or similar APIs are provided by connectors to access database or execute SQL statements, no obvious difference can be observed.
Key differences:
1. With REST connection, it's not necessary to install TDengine client driver (taosc), it's more friendly for cross-platform with the cost of 30% performance downgrade. When taosc has an upgrade, application does not need to make changes.
2. With native connection, full compatibility of TDengine can be utilized, like [Parameter Binding](/reference/connector/cpp#Parameter Binding-api), [Subscription](reference/connector/cpp#Subscription), etc. But taosc has to be installed, some platforms may not be supported.
## Install Client Driver taosc
If choosing to use native connection and the application is not on the same host as TDengine server, TDengine client driver taosc needs to be installed on the host where the application is. If choosing to use REST connection or the application is on the same host as server side, this step can be skipped. It's better to use same version of taosc as the server.
### Install
<Tabs defaultValue="linux" groupId="os">
<TabItem value="linux" label="Linux">
<InstallOnWindows />
</TabItem>
<TabItem value="windows" label="Windows">
<InstallOnLinux />
</TabItem>
</Tabs>
### Verify
After the above installation and configuration are done and making sure TDengine service is already started and in service, the TDengine command-line interface `taos` can be launched to access TDengine.
<Tabs defaultValue="linux" groupId="os">
<TabItem value="linux" label="Linux">
<VerifyLinux />
</TabItem>
<TabItem value="windows" label="Windows">
<VerifyWindows />
</TabItem>
</Tabs>
## Install Connectors
<Tabs groupId="lang">
<TabItem label="Java" value="java">
If `maven` is used to manage the projects, what needs to be done is only adding below dependency in `pom.xml`.
```xml
<dependency>
<groupId>com.taosdata.jdbc</groupId>
<artifactId>taos-jdbcdriver</artifactId>
<version>2.0.38</version>
</dependency>
```
</TabItem>
<TabItem label="Python" value="python">
Install from PyPI using `pip`:
```
pip install taospy
```
Install from Git URL:
```
pip install git+https://github.com/taosdata/taos-connector-python.git
```
</TabItem>
<TabItem label="Go" value="go">
Just need to add `driver-go` dependency in `go.mod` .
```go-mod title=go.mod
module goexample
go 1.17
require github.com/taosdata/driver-go/v2 develop
```
:::note
`driver-go` uses `cgo` to wrap the APIs provided by taosc, while `cgo` needs `gcc` to compile source code in C language, so please make sure you have proper `gcc` on your system.
:::
</TabItem>
<TabItem label="Rust" value="rust">
Just need to add `libtaos` dependency in `Cargo.toml`.
```toml title=Cargo.toml
[dependencies]
libtaos = { version = "0.4.2"}
```
:::info
Rust connector uses different features to distinguish the way to establish connection. To establish REST connection, please enable `rest` feature.
```toml
libtaos = { version = "*", features = ["rest"] }
```
:::
</TabItem>
<TabItem label="Node.js" value="node">
Node.js connector provides different ways of establishing connections by providing different packages.
1. Install Node.js Native Connector
```
npm i td2.0-connector
```
:::note
It's recommend to use Node whose version is between `node-v12.8.0` and `node-v13.0.0`.
:::
2. Install Node.js REST Connector
```
npm i td2.0-rest-connector
```
</TabItem>
<TabItem label="C#" value="csharp">
Just need to add the reference to [TDengine.Connector](https://www.nuget.org/packages/TDengine.Connector/) in the project configuration file.
```xml title=csharp.csproj {12}
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net6.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<StartupObject>TDengineExample.AsyncQueryExample</StartupObject>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="TDengine.Connector" Version="1.0.6" />
</ItemGroup>
</Project>
```
Or add by `dotnet` command.
```
dotnet add package TDengine.Connector
```
:::note
The sample code below are based on dotnet6.0, they may need to be adjusted if your dotnet version is not exactly same.
:::
</TabItem>
<TabItem label="R" value="r">
1. Download [taos-jdbcdriver-version-dist.jar](https://repo1.maven.org/maven2/com/taosdata/jdbc/taos-jdbcdriver/2.0.38/).
2. Install the dependency package `RJDBC`
```R
install.packages("RJDBC")
```
</TabItem>
<TabItem label="C" value="c">
If the client driver (taosc) is already installed, then the C connector is already available.
<br/>
</TabItem>
</Tabs>
## Establish Connection
Prior to establishing connection, please make sure TDengine is already running and accessible. The following sample code assumes TDengine is running on the same host as the client program, with FQDN configured to "localhost" and serverPort configured to "6030".
<Tabs groupId="lang" defaultValue="java">
<TabItem label="Java" value="java">
<ConnJava />
</TabItem>
<TabItem label="Python" value="python">
<ConnPythonNative />
</TabItem>
<TabItem label="Go" value="go">
<ConnGo />
</TabItem>
<TabItem label="Rust" value="rust">
<ConnRust />
</TabItem>
<TabItem label="Node.js" value="node">
<ConnNode />
</TabItem>
<TabItem label="C#" value="csharp">
<ConnCSNative />
</TabItem>
<TabItem label="R" value="r">
<ConnR/>
</TabItem>
<TabItem label="C" value="c">
<ConnC />
</TabItem>
</Tabs>
:::tip
If the connection fails, in most cases it's caused by improper configuration for FQDN or firewall. Please refer to the section "Unable to establish connection" in [FAQ](https://docs.taosdata.com/train-faq/faq).
:::
---
title: Data Model
---
The data model employed by TDengine is similar to relational database, you need to create databases and tables. For a specific application, the design of databases, STables (abbreviated for super table), and tables need to be considered. This chapter will explain the big picture without syntax details.
## Create Database
The characteristics of data from different data collection points may be different, such as collection frequency, days to keep, number of replicas, data block size, whether it's allowed to update data, etc. For TDengine to operate with the best performance, it's strongly suggested to put the data with different characteristics into different databases because different storage policy can be set for each database. When creating a database, there are a lot of parameters that can be configured, such as the days to keep data, the number of replicas, the number of memory blocks, time precision, the minimum and maximum number of rows in each data block, compress or not, the time range of the data in single data file, etc. Below is an example of the SQL statement for creating a database.
```sql
CREATE DATABASE power KEEP 365 DAYS 10 BLOCKS 6 UPDATE 1;
```
In the above SQL statement, a database named "power" will be created, the data in it will be kept for 365 days, which means the data older than 365 days will be deleted automatically, a new data file will be created every 10 days, the number of memory blocks is 6, data is allowed to be updated. For more details please refer to [Database](/taos-sql/database).
After creating a database, the current database in use can be switched using SQL command `USE`, for example below SQL statement switches the current database to `power`. Without current database specified, table name must be preceded with the corresponding database name.
```sql
USE power;
```
:::note
- Any table or STable must belong to a database. To create a table or STable, the database it belongs to must be ready.
- JOIN operation can't be performed tables from two different databases.
- Timestamp needs to be specified when inserting rows or querying historical rows.
:::
## Create STable
In a time-series application, there may be multiple kinds of data collection points. For example, in the electrical power system there are meters, transformers, bus bars, switches, etc. For easy and efficient aggregation of multiple tables, one STable needs to be created for each kind of data collection point. For example, for the meters in [table 1](/tdinternal/arch#model_table1), below SQL statement can be used to create the super table.
```sql
CREATE STable meters (ts timestamp, current float, voltage int, phase float) TAGS (location binary(64), groupId int);
```
:::note
If you are using versions prior to 2.0.15, the `STable` keyword needs to be replaced with `TABLE`.
:::
Similar to creating a regular table, when creating a STable, name and schema need to be provided too. In the STable schema, the first column must be timestamp (like ts in the example), and other columns (like current, voltage and phase in the example) are the data collected. The type of a column can be integer, float, double, string ,etc. Besides, the schema for tags need to be provided, like location and groupId in the example. The type of a tag can be integer, float, string, etc. The static properties of a data collection point can be defined as tags, like the location, device type, device group ID, manager ID, etc. Tags in the schema can be added, removed or updated. Please refer to [STable](/taos-sql/stable) for more details.
For each kind of data collection points, a corresponding STable must be created. There may be many STables in an application. For electrical power system, we need to create a STable respectively for meters, transformers, busbars, switches. There may be multiple kinds of data collection points on a single device, for example there may be one data collection point for electrical data like current and voltage and another point for environmental data like temperature, humidity and wind direction, multiple STables are required for such kind of device.
At most 4096 (or 1024 prior to version 2.1.7.0) columns are allowed in a STable. If there are more than 4096 of metrics to bo collected for a data collection point, multiple STables are required for such kind of data collection point. There can be multiple databases in system, while one or more STables can exist in a database.
## Create Table
A specific table needs to be created for each data collection point. Similar to RDBMS, table name and schema are required to create a table. Beside, one or more tags can be created for each table. To create a table, a STable needs to be used as template and the values need to be specified for the tags. For example, for the meters in [Table 1](/tdinternal/arch#model_table1), the table can be created using below SQL statement.
```sql
CREATE TABLE d1001 USING meters TAGS ("Beijing.Chaoyang", 2);
```
In the above SQL statement, "d1001" is the table name, "meters" is the STable name, followed by the value of tag "Location" and the value of tag "groupId", which are "Beijing.Chaoyang" and "2" respectively in the example. The tag values can be updated after the table is created. Please refer to [Tables](/taos-sql/table) for details.
In TDengine system, it's recommended to create a table for a data collection point via STable. Table created via STable is called subtable in some parts of TDengine document. All SQL commands applied on regular table can be applied on subtable.
:::warning
It's not recommended to create a table in a database while using a STable from another database as template.
:::tip
It's suggested to use the global unique ID of a data collection point as the table name, for example the device serial number. If there isn't such a unique ID, multiple IDs that are not global unique can be combined to form a global unique ID. It's not recommended to use a global unique ID as tag value.
## Create Table Automatically
In some circumstances, it's not sure whether the table already exists when inserting rows. The table can be created automatically using the SQL statement below, and nothing will happen if the table already exist.
```sql
INSERT INTO d1001 USING meters TAGS ("Beijng.Chaoyang", 2) VALUES (now, 10.2, 219, 0.32);
```
In the above SQL statement, a row with value `(now, 10.2, 219, 0.32)` will be inserted into table "d1001". If table "d1001" doesn't exist, it will be created automatically using STable "meters" as template with tag value `"Beijing.Chaoyang", 2`.
For more details please refer to [Create Table Automatically](/taos-sql/insert#automatically-create-table-when-inserting).
## Single Column vs Multiple Column
Multiple columns data model is supported in TDengine. As long as multiple metrics are collected by same data collection point at same time, i.e. the timestamp are identical, these metrics can be put in single stable as columns. However, there is another kind of design, i.e. single column data model, a table is created for each metric, which means a STable is required for each kind of metric. For example, 3 STables are required for current, voltage and phase.
It's recommended to use multiple column data model as much as possible because it's better in the performance of inserting or querying rows. In some cases, however, the metrics to be collected vary frequently and correspondingly the STable schema needs to be changed frequently too. In such case, it's more convenient to use single column data model.
---
sidebar_label: SQL
title: Insert Using SQL
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import JavaSQL from "./_java_sql.mdx";
import JavaStmt from "./_java_stmt.mdx";
import PySQL from "./_py_sql.mdx";
import PyStmt from "./_py_stmt.mdx";
import GoSQL from "./_go_sql.mdx";
import GoStmt from "./_go_stmt.mdx";
import RustSQL from "./_rust_sql.mdx";
import RustStmt from "./_rust_stmt.mdx";
import NodeSQL from "./_js_sql.mdx";
import NodeStmt from "./_js_stmt.mdx";
import CsSQL from "./_cs_sql.mdx";
import CsStmt from "./_cs_stmt.mdx";
import CSQL from "./_c_sql.mdx";
import CStmt from "./_c_stmt.mdx";
## Introduction
Application program can execute `INSERT` statement through connectors to insert rows. TAOS CLI can be launched manually to insert data too.
### Insert Single Row
Below SQL statement is used to insert one row into table "d1001".
```sql
INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31);
```
### Insert Multiple Rows
Multiple rows can be inserted in single SQL statement. Below example inserts 2 rows into table "d1001".
```sql
INSERT INTO d1001 VALUES (1538548684000, 10.2, 220, 0.23) (1538548696650, 10.3, 218, 0.25);
```
### Insert into Multiple Tables
Data can be inserted into multiple tables in same SQL statement. Below example inserts 2 rows into table "d1001" and 1 row into table "d1002".
```sql
INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31) (1538548695000, 12.6, 218, 0.33) d1002 VALUES (1538548696800, 12.3, 221, 0.31);
```
For more details about `INSERT` please refer to [INSERT](/taos-sql/insert).
:::info
- Inserting in batch can gain better performance. Normally, the higher the batch size, the better the performance. Please be noted each single row can't exceed 16K bytes and each single SQL statement can't exceed 1M bytes.
- Inserting with multiple threads can gain better performance too. However, depending on the system resources on the application side and the server side, with the number of inserting threads grows to a specific point, the performance may drop instead of growing. The proper number of threads need to be tested in a specific environment to find the best number.
:::
:::warning
- If the timestamp for the row to be inserted already exists in the table, the behavior depends on the value of parameter `UPDATE`. If it's set to 0 (also the default value), the row will be discarded. If it's set to 1, the new values will override the old values for the same row.
- The timestamp to be inserted must be newer than the timestamp of subtracting current time by the parameter `KEEP`. If `KEEP` is set to 3650 days, then the data older than 3650 days ago can't be inserted. The timestamp to be inserted can't be newer than the timestamp of current time plus parameter `DAYS`. If `DAYS` is set to 2, the data newer than 2 days later can't be inserted.
:::
## Examples
### Insert Using SQL
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
<JavaSQL />
</TabItem>
<TabItem label="Python" value="python">
<PySQL />
</TabItem>
<TabItem label="Go" value="go">
<GoSQL />
</TabItem>
<TabItem label="Rust" value="rust">
<RustSQL />
</TabItem>
<TabItem label="Node.js" value="nodejs">
<NodeSQL />
</TabItem>
<TabItem label="C#" value="csharp">
<CsSQL />
</TabItem>
<TabItem label="C" value="c">
<CSQL />
</TabItem>
</Tabs>
:::note
1. With either native connection or REST connection, the above samples can work well.
2. Please be noted that `use db` can't be used with REST connection because REST connection is stateless, so in the samples `dbName.tbName` is used to specify the table name.
:::
### Insert with Parameter Binding
TDengine also provides Prepare API that support parameter binding. Similar to MySQL, only `?` can be used in these APIs to represent the parameters to bind. From version 2.1.1.0 and 2.1.2.0, parameter binding support for inserting data has been improved significantly to improve the insert performance by avoiding the cost of parsing SQL statements.
Parameter binding is available only with native connection.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
<JavaStmt />
</TabItem>
<TabItem label="Python" value="python">
<PyStmt />
</TabItem>
<TabItem label="Go" value="go">
<GoStmt />
</TabItem>
<TabItem label="Rust" value="rust">
<RustStmt />
</TabItem>
<TabItem label="Node.js" value="nodejs">
<NodeStmt />
</TabItem>
<TabItem label="C#" value="csharp">
<CsStmt />
</TabItem>
<TabItem label="C" value="c">
<CStmt />
</TabItem>
</Tabs>
---
sidebar_label: InfluxDB Line Protocol
title: InfluxDB Line Protocol
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import JavaLine from "./_java_line.mdx";
import PyLine from "./_py_line.mdx";
import GoLine from "./_go_line.mdx";
import RustLine from "./_rust_line.mdx";
import NodeLine from "./_js_line.mdx";
import CsLine from "./_cs_line.mdx";
import CLine from "./_c_line.mdx";
## Introduction
A single line of text is used in InfluxDB Line protocol format represents one row of data, each line contains 4 parts as shown below.
```
measurement,tag_set field_set timestamp
```
- `measurement` will be used as the STable name
- `tag_set` will be used as tags, with format like `<tag_key>=<tag_value>,<tag_key>=<tag_value>`
- `field_set`will be used as data columns, with format like `<field_key>=<field_value>,<field_key>=<field_value>`
- `timestamp` is the primary key timestamp corresponding to this row of data
For example:
```
meters,location=Beijing.Haidian,groupid=2 current=13.4,voltage=223,phase=0.29 1648432611249500
```
:::note
- All the data in `tag_set` will be converted to ncahr type automatically .
- Each data in `field_set` must be self-description for its data type. For example 1.2f32 means a value 1.2 of float type, it will be treated as double without the "f" type suffix.
- Multiple kinds of precision can be used for the `timestamp` field. Time precision can be from nanosecond (ns) to hour (h).
:::
For more details please refer to [InfluxDB Line Protocol](https://docs.influxdata.com/influxdb/v2.0/reference/syntax/line-protocol/) and [TDengine Schemaless](/reference/schemaless/#Schemaless-Line-Protocol)
## Examples
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
<JavaLine />
</TabItem>
<TabItem label="Python" value="Python">
<PyLine />
</TabItem>
<TabItem label="Go" value="go">
<GoLine />
</TabItem>
<TabItem label="Rust" value="rust">
<RustLine />
</TabItem>
<TabItem label="Node.js" value="nodejs">
<NodeLine />
</TabItem>
<TabItem label="C#" value="csharp">
<CsLine />
</TabItem>
<TabItem label="C" value="c">
<CLine />
</TabItem>
</Tabs>
---
sidebar_label: OpenTSDB Line Protocol
title: OpenTSDB Line Protocol
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import JavaTelnet from "./_java_opts_telnet.mdx";
import PyTelnet from "./_py_opts_telnet.mdx";
import GoTelnet from "./_go_opts_telnet.mdx";
import RustTelnet from "./_rust_opts_telnet.mdx";
import NodeTelnet from "./_js_opts_telnet.mdx";
import CsTelnet from "./_cs_opts_telnet.mdx";
import CTelnet from "./_c_opts_telnet.mdx";
## Introduction
A single line of text is used in OpenTSDB line protocol to represent one row of data. OpenTSDB employs single column data model, so one line can only contains single data column. There can be multiple tags. Each line contains 4 parts as below:
```
<metric> <timestamp> <value> <tagk_1>=<tagv_1>[ <tagk_n>=<tagv_n>]
```
- `metric` will be used as STable name.
- `timestamp` is the timestamp of current row of data. The time precision will be determined automatically based on the length of the timestamp. second and millisecond time precision are supported.\
- `value` is a metric which must be a numeric value, the corresponding column name is "value".
- The last part is tag sets separated by space, all tags will be converted to nchar type automatically.
For example:
```txt
meters.current 1648432611250 11.3 location=Beijing.Haidian groupid=3
```
Please refer to [OpenTSDB Telnet API](http://opentsdb.net/docs/build/html/api_telnet/put.html) for more details.
## Examples
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
<JavaTelnet />
</TabItem>
<TabItem label="Python" value="Python">
<PyTelnet />
</TabItem>
<TabItem label="Go" value="go">
<GoTelnet />
</TabItem>
<TabItem label="Rust" value="rust">
<RustTelnet />
</TabItem>
<TabItem label="Node.js" value="nodejs">
<NodeTelnet />
</TabItem>
<TabItem label="C#" value="csharp">
<CsTelnet />
</TabItem>
<TabItem label="C" value="c">
<CTelnet />
</TabItem>
</Tabs>
2 STables will be crated automatically while each STable has 4 rows of data in the above sample code.
```cmd
taos> use test;
Database changed.
taos> show STables;
name | created_time | columns | tags | tables |
============================================================================================
meters.current | 2022-03-30 17:04:10.877 | 2 | 2 | 2 |
meters.voltage | 2022-03-30 17:04:10.882 | 2 | 2 | 2 |
Query OK, 2 row(s) in set (0.002544s)
taos> select tbname, * from `meters.current`;
tbname | ts | value | groupid | location |
==================================================================================================================================
t_0e7bcfa21a02331c06764f275... | 2022-03-28 09:56:51.249 | 10.800000000 | 3 | Beijing.Haidian |
t_0e7bcfa21a02331c06764f275... | 2022-03-28 09:56:51.250 | 11.300000000 | 3 | Beijing.Haidian |
t_7e7b26dd860280242c6492a16... | 2022-03-28 09:56:51.249 | 10.300000000 | 2 | Beijing.Chaoyang |
t_7e7b26dd860280242c6492a16... | 2022-03-28 09:56:51.250 | 12.600000000 | 2 | Beijing.Chaoyang |
Query OK, 4 row(s) in set (0.005399s)
```
---
sidebar_label: OpenTSDB JSON Protocol
title: OpenTSDB JSON Protocol
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import JavaJson from "./_java_opts_json.mdx";
import PyJson from "./_py_opts_json.mdx";
import GoJson from "./_go_opts_json.mdx";
import RustJson from "./_rust_opts_json.mdx";
import NodeJson from "./_js_opts_json.mdx";
import CsJson from "./_cs_opts_json.mdx";
import CJson from "./_c_opts_json.mdx";
## Introduction
A JSON string is used in OpenTSDB JSON to represent one or more rows of data, for example:
```json
[
{
"metric": "sys.cpu.nice",
"timestamp": 1346846400,
"value": 18,
"tags": {
"host": "web01",
"dc": "lga"
}
},
{
"metric": "sys.cpu.nice",
"timestamp": 1346846400,
"value": 9,
"tags": {
"host": "web02",
"dc": "lga"
}
}
]
```
Similar to OpenTSDB line protocol, `metric` will be used as the STable name, `timestamp` is the timestamp to be used, `value` represents the metric collected, `tags` are the tag sets.
Please refer to [OpenTSDB HTTP API](http://opentsdb.net/docs/build/html/api_http/put.html) for more details.
:::note
- In JSON protocol, strings will be converted to nchar type and numeric values will be converted to double type.
- Only data in array format is accepted, array must be used even there is only one row.
:::
## Examples
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
<JavaJson />
</TabItem>
<TabItem label="Python" value="Python">
<PyJson />
</TabItem>
<TabItem label="Go" value="go">
<GoJson />
</TabItem>
<TabItem label="Rust" value="rust">
<RustJson />
</TabItem>
<TabItem label="Node.js" value="nodejs">
<NodeJson />
</TabItem>
<TabItem label="C#" value="csharp">
<CsJson />
</TabItem>
<TabItem label="C" value="c">
<CJson />
</TabItem>
</Tabs>
The above sample code will created 2 STables automatically while each STable has 2 rows of data.
```cmd
taos> use test;
Database changed.
taos> show STables;
name | created_time | columns | tags | tables |
============================================================================================
meters.current | 2022-03-29 16:05:25.193 | 2 | 2 | 1 |
meters.voltage | 2022-03-29 16:05:25.200 | 2 | 2 | 1 |
Query OK, 2 row(s) in set (0.001954s)
taos> select * from `meters.current`;
ts | value | groupid | location |
===================================================================================================================
2022-03-28 09:56:51.249 | 10.300000000 | 2.000000000 | Beijing.Chaoyang |
2022-03-28 09:56:51.250 | 12.600000000 | 2.000000000 | Beijing.Chaoyang |
Query OK, 2 row(s) in set (0.004076s)
```
```c
{{#include docs-examples/c/line_example.c:main}}
```
\ No newline at end of file
```c
{{#include docs-examples/c/json_protocol_example.c:main}}
```
\ No newline at end of file
```c
{{#include docs-examples/c/telnet_line_example.c:main}}
```
\ No newline at end of file
```c
{{#include docs-examples/c/insert_example.c}}
```
\ No newline at end of file
```c title=Single Row Binding
{{#include docs-examples/c/stmt_example.c}}
```
```c title=Multiple Row Binding 72:117
{{#include docs-examples/c/multi_bind_example.c}}
```
\ No newline at end of file
```csharp
{{#include docs-examples/csharp/InfluxDBLineExample.cs}}
```
```csharp
{{#include docs-examples/csharp/OptsJsonExample.cs}}
```
```csharp
{{#include docs-examples/csharp/OptsTelnetExample.cs}}
```
```csharp
{{#include docs-examples/csharp/SQLInsertExample.cs}}
```
```csharp
{{#include docs-examples/csharp/StmtInsertExample.cs}}
```
```go
{{#include docs-examples/go/insert/line/main.go}}
```
```go
{{#include docs-examples/go/insert/json/main.go}}
```
```go
{{#include docs-examples/go/insert/telnet/main.go}}
```
```go
{{#include docs-examples/go/insert/sql/main.go}}
```
```go
{{#include docs-examples/go/insert/stmt/main.go}}
```
:::tip
`github.com/taosdata/driver-go/v2/wrapper` module in driver-go is the wrapper for C API, it can be used to insert data with parameter binding.
:::
```java
{{#include docs-examples/java/src/main/java/com/taos/example/LineProtocolExample.java}}
```
```java
{{#include docs-examples/java/src/main/java/com/taos/example/JSONProtocolExample.java}}
```
```java
{{#include docs-examples/java/src/main/java/com/taos/example/TelnetLineProtocolExample.java}}
```
```java
{{#include docs-examples/java/src/main/java/com/taos/example/RestInsertExample.java:insert}}
```
\ No newline at end of file
```java
{{#include docs-examples/java/src/main/java/com/taos/example/StmtInsertExample.java}}
```
```js
{{#include docs-examples/node/nativeexample/influxdb_line_example.js}}
```
```js
{{#include docs-examples/node/nativeexample/opentsdb_json_example.js}}
```
```js
{{#include docs-examples/node/nativeexample/opentsdb_telnet_example.js}}
```
```js
{{#include docs-examples/node/nativeexample/insert_example.js}}
```
```js title=Single Row Binding
{{#include docs-examples/node/nativeexample/param_bind_example.js}}
```
```js title=Multiple Row Binding
{{#include docs-examples/node/nativeexample/multi_bind_example.js:insertData}}
```
:::info
Multiple row binding is better in performance than single row binding, but it can only be used with `INSERT` statement while single row binding can be used for other SQL statements besides `INSERT`.
:::
```py
{{#include docs-examples/python/line_protocol_example.py}}
```
```py
{{#include docs-examples/python/json_protocol_example.py}}
```
```py
{{#include docs-examples/python/telnet_line_protocol_example.py}}
```
```py
{{#include docs-examples/python/native_insert_example.py}}
```
```py title=Single Row Binding
{{#include docs-examples/python/bind_param_example.py}}
```
```py title=Multiple Row Binding
{{#include docs-examples/python/multi_bind_example.py:bind_batch}}
```
:::info
Multiple row binding is better in performance than single row binding, but it can only be used with `INSERT` statement while single row binding can be used for other SQL statements besides `INSERT`.
:::
\ No newline at end of file
```rust
{{#include docs-examples/rust/schemalessexample/examples/influxdb_line_example.rs}}
```
```rust
{{#include docs-examples/rust/schemalessexample/examples/opentsdb_json_example.rs}}
```
```rust
{{#include docs-examples/rust/schemalessexample/examples/opentsdb_telnet_example.rs}}
```
```rust
{{#include docs-examples/rust/restexample/examples/insert_example.rs}}
```
```rust
{{#include docs-examples/rust/nativeexample/examples/stmt_example.rs}}
```
---
title: Insert
---
TDengine supports multiple protocols of inserting data, including SQL, InfluxDB Line protocol, OpenTSDB Telnet protocol, OpenTSDB JSON protocol. Data can be inserted row by row, or in batch. Data from one or more collecting points can be inserted simultaneously. In the meantime, data can be inserted with multiple threads, out of order data and historical data can be inserted too. InfluxDB Line protocol, OpenTSDB Telnet protocol and OpenTSDB JSON protocol are the 3 kinds of schemaless insert protocols supported by TDengine. It's not necessary to create stable and table in advance if using schemaless protocols, and the schemas can be adjusted automatically according to the data to be inserted.
```mdx-code-block
import DocCardList from '@theme/DocCardList';
import {useCurrentSidebarCategory} from '@docusaurus/theme-common';
<DocCardList items={useCurrentSidebarCategory().items}/>
```
\ No newline at end of file
```c
{{#include docs-examples/c/query_example.c}}
```
\ No newline at end of file
```c
{{#include docs-examples/c/async_query_example.c:demo}}
```
\ No newline at end of file
```csharp
{{#include docs-examples/csharp/QueryExample.cs}}
```
```csharp
{{#include docs-examples/csharp/AsyncQueryExample.cs}}
```
```go
{{#include docs-examples/go/query/sync/main.go}}
```
```go
{{#include docs-examples/go/query/async/main.go}}
```
```java
{{#include docs-examples/java/src/main/java/com/taos/example/RestQueryExample.java}}
```
```js
{{#include docs-examples/node/nativeexample/query_example.js}}
```
```js
{{#include docs-examples/node/nativeexample/async_query_example.js}}
```
Result set is iterated row by row.
```py
{{#include docs-examples/python/query_example.py:iter}}
```
Result set is retrieved as a whole, each row is converted to a dict and returned.
```py
{{#include docs-examples/python/query_example.py:fetch_all}}
```
\ No newline at end of file
```py
{{#include docs-examples/python/async_query_example.py}}
```
:::note
This sample code can't be run on Windows system for now.
:::
```rust
{{#include docs-examples/rust/restexample/examples/query_example.rs}}
```
---
Sidebar_label: Select
title: Select
description: "This chapter introduces major query functionalities and how to perform sync and async query using connectors."
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import JavaQuery from "./_java.mdx";
import PyQuery from "./_py.mdx";
import GoQuery from "./_go.mdx";
import RustQuery from "./_rust.mdx";
import NodeQuery from "./_js.mdx";
import CsQuery from "./_cs.mdx";
import CQuery from "./_c.mdx";
import PyAsync from "./_py_async.mdx";
import NodeAsync from "./_js_async.mdx";
import CsAsync from "./_cs_async.mdx";
import CAsync from "./_c_async.mdx";
## Introduction
SQL is used by TDengine as the query language. Application programs can send SQL statements to TDengine through REST API or connectors. TDengine CLI `taos` can also be used to execute SQL Ad-Hoc query. Here is the list of major query functionalities supported by TDengine:
- Query on single column or multiple columns
- Filter on tags or data columns:>, <, =, <\>, like
- Grouping of results: `Group By`
- Sorting of results: `Order By`
- Limit the number of results: `Limit/Offset`
- Arithmetic on columns of numeric types or aggregate results
- Join query with timestamp alignment
- Aggregate functions: count, max, min, avg, sum, twa, stddev, leastsquares, top, bottom, first, last, percentile, apercentile, last_row, spread, diff
For example, below SQL statement can be executed in TDengine CLI `taos` to select the rows whose voltage column is bigger than 215 and limit the output to only 2 rows.
```sql
select * from d1001 where voltage > 215 order by ts desc limit 2;
```
```title=Output
taos> select * from d1001 where voltage > 215 order by ts desc limit 2;
ts | current | voltage | phase |
======================================================================================
2018-10-03 14:38:16.800 | 12.30000 | 221 | 0.31000 |
2018-10-03 14:38:15.000 | 12.60000 | 218 | 0.33000 |
Query OK, 2 row(s) in set (0.001100s)
```
To meet the requirements in many use cases, some special functions have been added in TDengine, for example `twa` (Time Weighted Average), `spared` (The difference between the maximum and the minimum), `last_row` (the last row), more and more functions will be added to better perform in many use cases. Furthermore, continuous query is also supported in TDengine.
For detailed query syntax please refer to [Select](/taos-sql/select).
## Aggregation among Tables
In many use cases, there are always multiple kinds of data collection points. A new concept, called STable (abbreviated for super table), is used in TDengine to represent a kind of data collection points, and a table is used to represent a specific data collection point. Tags are used by TDengine to represent the static properties of data collection points. A specific data collection point has its own values for static properties. By specifying filter conditions on tags, aggregation can be performed efficiently among all the subtables created via the same STable, i.e. same kind of data collection points, can be. Aggregate functions applicable for tables can be used directly on STables, syntax is exactly same.
In summary, for a STable, its subtables can be aggregated by a simple query on STable, it's kind of join operation. But tables belong to different STables could not be aggregated.
### Example 1
In TDengine CLI `taos`, use below SQL to get the average voltage of all the meters in BeiJing grouped by location.
```
taos> SELECT AVG(voltage) FROM meters GROUP BY location;
avg(voltage) | location |
=============================================================
222.000000000 | Beijing.Haidian |
219.200000000 | Beijing.Chaoyang |
Query OK, 2 row(s) in set (0.002136s)
```
### Example 2
In TDengine CLI `taos`, use below SQL to get the number of rows and the maximum current in the past 24 hours from meters whose groupId is 2.
```
taos> SELECT count(*), max(current) FROM meters where groupId = 2 and ts > now - 24h;
cunt(*) | max(current) |
==================================
5 | 13.4 |
Query OK, 1 row(s) in set (0.002136s)
```
Join query is allowed between only the tables of same STable. In [Select](/taos-sql/select), all query operations are marked as whether it supports STable or not.
## Down Sampling and Interpolation
In IoT use cases, down sampling is widely used to aggregate the data by time range. `INTERVAL` keyword in TDengine can be used to simplify the query by time window. For example, below SQL statement can be used to get the sum of current every 10 seconds from meters table d1001.
```
taos> SELECT sum(current) FROM d1001 INTERVAL(10s);
ts | sum(current) |
======================================================
2018-10-03 14:38:00.000 | 10.300000191 |
2018-10-03 14:38:10.000 | 24.900000572 |
Query OK, 2 row(s) in set (0.000883s)
```
Down sampling can also be used for STable. For example, below SQL statement can be used to get the sum of current from all meters in BeiJing.
```
taos> SELECT SUM(current) FROM meters where location like "Beijing%" INTERVAL(1s);
ts | sum(current) |
======================================================
2018-10-03 14:38:04.000 | 10.199999809 |
2018-10-03 14:38:05.000 | 32.900000572 |
2018-10-03 14:38:06.000 | 11.500000000 |
2018-10-03 14:38:15.000 | 12.600000381 |
2018-10-03 14:38:16.000 | 36.000000000 |
Query OK, 5 row(s) in set (0.001538s)
```
Down sampling also supports time offset. For example, below SQL statement can be used to get the sum of current from all meters but each time window must start at the boundary of 500 milliseconds.
```
taos> SELECT SUM(current) FROM meters INTERVAL(1s, 500a);
ts | sum(current) |
======================================================
2018-10-03 14:38:04.500 | 11.189999809 |
2018-10-03 14:38:05.500 | 31.900000572 |
2018-10-03 14:38:06.500 | 11.600000000 |
2018-10-03 14:38:15.500 | 12.300000381 |
2018-10-03 14:38:16.500 | 35.000000000 |
Query OK, 5 row(s) in set (0.001521s)
```
In many use cases, it's hard to align the timestamp of the data collected by each collection point. However, a lot of algorithms like FFT require the data to be aligned with same time interval and application programs have to handle by themselves in many systems. In TDengine, it's easy to achieve the alignment using down sampling.
Interpolation can be performed in TDengine if there is no data in a time range.
For more details please refer to [Aggregate by Window](/taos-sql/interval).
## Examples
### Query
In the section describing [Insert](/develop/insert-data/sql-writing), a database named `power` is created and some data are inserted into STable `meters`. Below sample code demonstrates how to query the data in this STable.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
<JavaQuery />
</TabItem>
<TabItem label="Python" value="python">
<PyQuery />
</TabItem>
<TabItem label="Go" value="go">
<GoQuery />
</TabItem>
<TabItem label="Rust" value="rust">
<RustQuery />
</TabItem>
<TabItem label="Node.js" value="nodejs">
<NodeQuery />
</TabItem>
<TabItem label="C#" value="csharp">
<CsQuery />
</TabItem>
<TabItem label="C" value="c">
<CQuery />
</TabItem>
</Tabs>
:::note
1. With either REST connection or native connection, the above sample code work well.
2. Please be noted that `use db` can't be used in case of REST connection because it's stateless.
:::
### Asynchronous Query
Besides synchronous query, asynchronous query API is also provided by TDengine to insert or query data more efficiently. With similar hardware and software environment, async API is 2~4 times faster than sync APIs. Async API works in non-blocking mode, which means an operation can be returned without finishing so that the calling thread can switch to other works to improve the performance of the whole application system. Async APIs perform especially better in case of poor network.
Please be noted that async query can only be used with native connection.
<Tabs defaultValue="python" groupId="lang">
<TabItem label="Python" value="python">
<PyAsync />
</TabItem>
<TabItem label="C#" value="csharp">
<CsAsync />
</TabItem>
<TabItem label="C" value="c">
<CAsync />
</TabItem>
</Tabs>
---
sidebar_label: Continuous Query
description: "Continuous query is a query that's executed automatically according to predefined frequency to provide aggregate query capability by time window, it's actually a simplified time driven stream computing."
title: "Continuous Query"
---
Continuous query is a query that's executed automatically according to predefined frequency to provide aggregate query capability by time window, it's actually a simplified time driven stream computing. Continuous query can be performed on a table or STable in TDengine. The result of continuous query can be pushed to client or written back to TDengine. Each query is executed on a time window, which moves forward with time. The size of time window and the forward sliding time need to be specified with parameter `INTERVAL` and `SLIDING` respectively.
Continuous query in TDengine is time driven, and can be defined using TAOS SQL directly without any extra operations. With continuous query, the result can be generated according to time window to achieve down sampling of original data. Once a continuous query is defined using TAOS SQL, the query is automatically executed at the end of each time window and the result is pushed back to client or written to TDengine.
There are some differences between continuous query in TDengine and time window computation in stream computing:
- The computation is performed and the result is returned in real time in stream computing, but the computation in continuous query is only started when a time window closes. For example, if the time window is 1 day, then the result will only be generated at 23:59:59.
- If a historical data row is written in to a time widow for which the computation has been finished, the computation will not be performed again and the result will not be pushed to client again either. If the result has been written into TDengine, there will be no update for the result.
- In continuous query, if the result is pushed to client, the client status is not cached on the server side and Exactly-once is not guaranteed by the server either. If the client program crashes, a new time window will be generated from the time where the continuous query is restarted. If the result is written into TDengine, the data written into TDengine can be guaranteed as valid and continuous.
## Syntax
```sql
[CREATE TABLE AS] SELECT select_expr [, select_expr ...]
FROM {tb_name_list}
[WHERE where_condition]
[INTERVAL(interval_val [, interval_offset]) [SLIDING sliding_val]]
```
INTERVAL: The time window for which continuous query is performed
SLIDING: The time step for which the time window moves forward each time
## How to Use
In this section the use case of meters will be used to introduce how to use continuous query. Assume the STable and sub tables have been created using below SQL statement.
```sql
create table meters (ts timestamp, current float, voltage int, phase float) tags (location binary(64), groupId int);
create table D1001 using meters tags ("Beijing.Chaoyang", 2);
create table D1002 using meters tags ("Beijing.Haidian", 2);
```
The average voltage for each time window of one minute with 30 seconds as the length of moving forward can be retrieved using below SQL statement.
```sql
select avg(voltage) from meters interval(1m) sliding(30s);
```
Whenever the above SQL statement is executed, all the existing data will be computed again. If the computation needs to be performed every 30 seconds automatically to compute on the data in the past one minute, the above SQL statement needs to be revised as below, in which `{startTime}` stands for the beginning timestamp in the latest time window.
```sql
select avg(voltage) from meters where ts > {startTime} interval(1m) sliding(30s);
```
Another easier way for same purpose is prepend `create table {tableName} as` before the `select`.
```sql
create table avg_vol as select avg(voltage) from meters interval(1m) sliding(30s);
```
A table named as `avg_vol` will be created automatically, then every 30 seconds the `select` statement will be executed automatically on the data in the past 1 minutes, i.e. the latest time window, and the result is written into table `avg_vol`. The client program just needs to query from table `avg_vol`. For example:
```sql
taos> select * from avg_vol;
ts | avg_voltage_ |
===================================================
2020-07-29 13:37:30.000 | 222.0000000 |
2020-07-29 13:38:00.000 | 221.3500000 |
2020-07-29 13:38:30.000 | 220.1700000 |
2020-07-29 13:39:00.000 | 223.0800000 |
```
Please be noted that the minimum allowed time window is 10 milliseconds, and no upper limit.
Besides, it's allowed to specify the start and end time of continuous query. If the start time is not specified, the timestamp of the first original row will be considered as the start time; if the end time is not specified, the continuous will be performed infinitely, otherwise it will be terminated once the end time is reached. For example, the continuous query in below SQL statement will be started from now and terminated one hour later.
```sql
create table avg_vol as select avg(voltage) from meters where ts > now and ts <= now + 1h interval(1m) sliding(30s);
```
`now` in above SQL statement stands for the time when the continuous query is created, not the time when the computation is actually performed. Besides, to avoid the trouble caused by the delay of original data as much as possible, the actual computation in continuous query is also started with a little delay. That means, once a time window closes, the computation is not started immediately. Normally, the result can only be available a little time later, normally within one minute, after the time window closes.
## How to Manage
`show streams` command can be used in TDengine CLI `taos` to show all the continuous queries in the system, and `kill stream` can be used to terminate a continuous query.
---
sidebar_label: Subscription
description: "Lightweight service for data subscription and pushing, the time series data inserted into TDengine continuously can be pushed automatically to the subscribing clients."
title: Data Subscription
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import Java from "./_sub_java.mdx";
import Python from "./_sub_python.mdx";
import Go from "./_sub_go.mdx";
import Rust from "./_sub_rust.mdx";
import Node from "./_sub_node.mdx";
import CSharp from "./_sub_cs.mdx";
import CDemo from "./_sub_c.mdx";
## Introduction
According to the time series nature of the data, data inserting in TDengine is similar to data publishing in message queues, they both can be considered as a new data record with timestamp is inserted into the system. Data is stored in ascending order of timestamp inside TDengine, so essentially each table in TDengine can be considered as a message queue.
Lightweight service for data subscription and pushing is built in TDengine. With the API provided by TDengine, client programs can used `select` statement to subscribe the data from one or more tables. The subscription and and state maintenance is performed on the client side, the client programs polls the server to check whether there is new data, and if so the new data will be pushed back to the client side. If the client program is restarted, where to start for retrieving new data is up to the client side.
There are 3 major APIs related to subscription provided in the TDengine client driver.
```c
taos_subscribe
taos_consume
taos_unsubscribe
```
For more details about these API please refer to [C/C++ Connector](/reference/connector/cpp). Their usage will be introduced below using the use case of meters, in which the schema of STable and sub tables please refer to the previous section "continuous query". Full sample code can be found [here](https://github.com/taosdata/TDengine/blob/master/examples/c/subscribe.c).
If we want to get notification and take some actions if the current exceeds a threshold, like 10A, from some meters, there are two ways:
The first way is to query on each sub table and record the last timestamp matching the criteria, then after some time query on the data later than recorded timestamp and repeat this process. The SQL statements for this way are as below.
```sql
select * from D1001 where ts > {last_timestamp1} and current > 10;
select * from D1002 where ts > {last_timestamp2} and current > 10;
...
```
The above way works, but the problem is that the number of `select` statements increases with the number of meters grows. Finally the performance of both client side and server side will be unacceptable once the number of meters grows to a big enough number.
A better way is to query on the STable, only one `select` is enough regardless of the number of meters, like below:
```sql
select * from meters where ts > {last_timestamp} and current > 10;
```
However, how to choose `last_timestamp` becomes a new problem if using this way. Firstly, the timestamp when the data is generated is different from the timestamp when the data is inserted into the database, sometimes the difference between them may be very big. Secondly, the time when the data from different meters may arrives at the database may be different too. If the timestamp of the "slowest" meter is used as `last_timestamp` in the query, the data from other meters may be selected repeatedly; but if the timestamp of the "fasted" meters is used as `last_timestamp`, some data from other meters may be missed.
All the problems mentioned above can be resolved thoroughly using subscription provided by TDengine.
The first step is to create subscription using `taos_subscribe`.
```c
TAOS_SUB* tsub = NULL;
if (async) {
  // create an asynchronous subscription, the callback function will be called every 1s
  tsub = taos_subscribe(taos, restart, topic, sql, subscribe_callback, &blockFetch, 1000);
} else {
  // create an synchronous subscription, need to call 'taos_consume' manually
  tsub = taos_subscribe(taos, restart, topic, sql, NULL, NULL, 0);
}
```
The subscription in TDengine can be either synchronous or asynchronous. In the above sample code, the value of variable `async` is determined from the CLI input, then it's used to create either an async or sync subscription. Sync subscription means the client program needs to invoke `taos_consume` to retrieve data, and async subscription means another thread created by `taos_subscribe` internally invokes `taos_consume` to retrieve data and pass the data to `subscribe_callback` for processing, `subscribe_callback` is a call back function provided by the client program and it's suggested not to do time consuming operation in the call back function.
The parameter `taos` is an established connection. There is nothing special in sync subscription mode. In async subscription, it should be exclusively by current thread, otherwise unpredictable error may occur.
The parameter `sql` is a `select` statement in which `where` clause can be used to specify filter conditions. In our example, the data whose current exceeds 10A needs to be subscribed like below SQL statement:
```sql
select * from meters where current > 10;
```
Please be noted that, all the data will be processed because no start time is specified. If only the data from one day ago needs to be processed, a time related condition can be added:
```sql
select * from meters where ts > now - 1d and current > 10;
```
The parameter `topic` is the name of the subscription, it needs to be guaranteed unique in the client program, but it's not necessary to be globally unique because subscription is implemented in the APIs on client side.
If the subscription named as `topic` doesn't exist, parameter `restart` would be ignored. If the subscription named as `topic` has been created before by the client program which then exited, when the client program is restarted to use this `topic`, parameter `restart` is used to determine retrieving data from beginning or from the last point where the subscription was broken. If the value of `restart` is **true** (i.e. a non-zero value), the data will be retrieved from beginning, or if it is **false** (i.e. zero), the data already consumed before will not be processed again.
The last parameter of `taos_subscribe` is the polling interval in unit of millisecond. In sync mode, if the time difference between two continuous invocations to `taos_consume` is smaller than the interval specified by `taos_subscribe`, `taos_consume` would be blocked until the interval is reached. In async mode, this interval is the minimum interval between two invocations to the call back function.
The last second parameter of `taos_subscribe` is used to pass arguments to the call back function. `taos_subscribe` doesn't process this parameter and simply passes it to the call back function. This parameter is simply ignored in sync mode.
After a subscription is created, its data can be consumed and processed, below is the sample code of how to consume data in sync mode, in the else part if `if (async)`.
```c
if (async) {
  getchar();
} else while(1) {
  TAOS_RES* res = taos_consume(tsub);
  if (res == NULL) {
    printf("failed to consume data.");
    break;
  } else {
    print_result(res, blockFetch);
    getchar();
  }
}
```
In the above sample code, there is an infinite loop, each time carriage return is entered `taos_consume` is invoked, the return value of `taos_consume` is the selected result set, exactly as the input of `taos_use_result`, in the above sample `print_result` is used instead to simplify the sample. Below is the implementation of `print_result`.
```c
void print_result(TAOS_RES* res, int blockFetch) {
  TAOS_ROW row = NULL;
  int num_fields = taos_num_fields(res);
  TAOS_FIELD* fields = taos_fetch_fields(res);
  int nRows = 0;
  if (blockFetch) {
    nRows = taos_fetch_block(res, &row);
    for (int i = 0; i < nRows; i++) {
      char temp[256];
      taos_print_row(temp, row + i, fields, num_fields);
      puts(temp);
    }
  } else {
    while ((row = taos_fetch_row(res))) {
      char temp[256];
      taos_print_row(temp, row, fields, num_fields);
      puts(temp);
      nRows++;
    }
  }
  printf("%d rows consumed.\n", nRows);
}
```
In the above code `taos_print_row` is used to process the data consumed. All the matching rows will be printed.
In async mode, the data consuming is simpler as below.
```c
void subscribe_callback(TAOS_SUB* tsub, TAOS_RES *res, void* param, int code) {
  print_result(res, *(int*)param);
}
```
`taos_unsubscribe` can be invoked to terminate a subscription.
```c
taos_unsubscribe(tsub, keep);
```
The second parameter `keep` is used to specify whether to keep the subscription progress on the client sde. If it is **false**, i.e. **0**, then subscription will be restarted from beginning regardless of the `restart` parameter's value in when `taos_subscribe` is invoked again. The subscription progress information is stored in _{DataDir}/subscribe/_ , under which there is a file with same name as `topic` for each subscription, the subscription will be restarted from beginning if the corresponding progress file is removed.
Now let's see the effect of the above sample code, assuming below prerequisites have been done.
- The sample code has been downloaded to local system
- TDengine has been installed and launched properly on same system
- The database, STable, sub tables required in the sample code have been ready
It's ready to launch below command in the directory where the sample code resides to compile and start the program.
```bash
make
./subscribe -sql='select * from meters where current > 10;'
```
After the program is started, open another terminal and launch TDengine CLI `taos`, then use below SQL commands to insert a row whose current is 12A into table **D1001**.
```sql
use test;
insert into D1001 values(now, 12, 220, 1);
```
Then, this row of data will be shown by the example program on the first terminal because its current exceeds 10A. More data can be inserted for you to observe the output of the example program.
## Examples
Below example program demonstrates how to subscribe the data rows whose current exceeds 10A using connectors.
### Prepare Data
```bash
# create database "power"
taos> create database power;
# use "power" as the database in following operations
taos> use power;
# create super table "meters"
taos> create table meters(ts timestamp, current float, voltage int, phase int) tags(location binary(64), groupId int);
# create tabes using the schema defined by super table "meters"
taos> create table d1001 using meters tags ("Beijing.Chaoyang", 2);
taos> create table d1002 using meters tags ("Beijing.Haidian", 2);
# insert some rows
taos> insert into d1001 values("2020-08-15 12:00:00.000", 12, 220, 1),("2020-08-15 12:10:00.000", 12.3, 220, 2),("2020-08-15 12:20:00.000", 12.2, 220, 1);
taos> insert into d1002 values("2020-08-15 12:00:00.000", 9.9, 220, 1),("2020-08-15 12:10:00.000", 10.3, 220, 1),("2020-08-15 12:20:00.000", 11.2, 220, 1);
# filter out the rows in which current is bigger than 10A
taos> select * from meters where current > 10;
ts | current | voltage | phase | location | groupid |
===========================================================================================================
2020-08-15 12:10:00.000 | 10.30000 | 220 | 1 | Beijing.Haidian | 2 |
2020-08-15 12:20:00.000 | 11.20000 | 220 | 1 | Beijing.Haidian | 2 |
2020-08-15 12:00:00.000 | 12.00000 | 220 | 1 | Beijing.Chaoyang | 2 |
2020-08-15 12:10:00.000 | 12.30000 | 220 | 2 | Beijing.Chaoyang | 2 |
2020-08-15 12:20:00.000 | 12.20000 | 220 | 1 | Beijing.Chaoyang | 2 |
Query OK, 5 row(s) in set (0.004896s)
```
### Example Programs
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
<Java />
</TabItem>
<TabItem label="Python" value="Python">
<Python />
</TabItem>
{/* <TabItem label="Go" value="go">
<Go/>
</TabItem> */}
<TabItem label="Rust" value="rust">
<Rust />
</TabItem>
{/* <TabItem label="Node.js" value="nodejs">
<Node/>
</TabItem>
<TabItem label="C#" value="csharp">
<CSharp/>
</TabItem> */}
<TabItem label="C" value="c">
<CDemo />
</TabItem>
</Tabs>
### Run the Examples
The example programs firstly consume all historical data matching the criteria.
```bash
ts: 1597464000000 current: 12.0 voltage: 220 phase: 1 location: Beijing.Chaoyang groupid : 2
ts: 1597464600000 current: 12.3 voltage: 220 phase: 2 location: Beijing.Chaoyang groupid : 2
ts: 1597465200000 current: 12.2 voltage: 220 phase: 1 location: Beijing.Chaoyang groupid : 2
ts: 1597464600000 current: 10.3 voltage: 220 phase: 1 location: Beijing.Haidian groupid : 2
ts: 1597465200000 current: 11.2 voltage: 220 phase: 1 location: Beijing.Haidian groupid : 2
```
Next, use TDengine CLI to insert a new row.
```
# taos
taos> use power;
taos> insert into d1001 values(now, 12.4, 220, 1);
```
Because the current in inserted row exceeds 10A, it will be consumed by the example program.
```
ts: 1651146662805 current: 12.4 voltage: 220 phase: 1 location: Beijing.Chaoyang groupid: 2
```
---
sidebar_label: Cache
title: Cache
description: "The latest row of each table is kept in cache to provide high performance query of latest state."
---
The cache management policy in TDengine is First-In-First-Out (FIFO), which is also known as insert driven cache management policy and different from read driven cache management, i.e. Least-Recent-Used (LRU). It simply stores the latest data in cache and flushes the oldest data in cache to disk when the cache usage reaches a threshold. In IoT use cases, the most cared about data is the latest data, i.e. current state. The cache policy in TDengine is based the nature of IoT data.
Caching the latest data provides the capability of retrieving data in milliseconds. With this capability, TDengine can be configured properly to be used as caching system without deploying another separate caching system to simplify the system architecture and minimize the operation cost. The cache will be emptied after TDengine is restarted, TDengine doesn't reload data from disk into cache like a real key-value caching system.
The memory space used by TDengine cache is fixed in size, according to the configuration based on application requirement and system resources. Independent memory pool is allocated for and managed by each vnode (virtual node) in TDengine, there is no sharing of memory pools between vnodes. All the tables belonging to a vnode share all the cache memory of the vnode.
Memory pool is divided into blocks and data is stored in row format in memory and each block follows FIFO policy. The size of each block is determined by configuration parameter `cache`, the number of blocks for each vnode is determined by `blocks`. For each vnode, the total cache size is `cache * blocks`. It's better to set the size of each block to hold at least tends of rows.
`last_row` function can be used to retrieve the last row of a table or a STable to quickly show the current state of devices on monitoring screen. For example below SQL statement retrieves the latest voltage of all meters in Chaoyang district of Beijing.
```sql
select last_row(voltage) from meters where location='Beijing.Chaoyang';
```
---
sidebar_label: UDF
title: User Defined Functions
description: "Scalar functions and aggregate functions developed by users can be utilized by the query framework to expand the query capability"
---
In some use cases, the query capability required by application programs can't be achieved directly by builtin functions. With UDF, the functions developed by users can be utilized by query framework to meet some special requirements. UDF normally takes one column of data as input, but can also support the result of sub query as input.
From version 2.2.0.0, UDF programmed in C/C++ language can be supported by TDengine.
Two kinds of functions can be implemented by UDF: scalar function and aggregate function.
## Define UDF
### Scalar Function
Below function template can be used to define your own scalar function.
`void udfNormalFunc(char* data, short itype, short ibytes, int numOfRows, long long* ts, char* dataOutput, char* interBuf, char* tsOutput, int* numOfOutput, short otype, short obytes, SUdfInit* buf)`
`udfNormalFunc` is the place holder of function name, a function implemented based on the above template can be used to perform scalar computation on data rows. The parameters are fixed to control the data exchange between UDF and TDengine.
- Definitions of the parameters:
- data:input data
- itype:the type of input data, for details please refer to [type definition in column_meta](/reference/rest-api/), for example 4 represents INT
- iBytes:the number of bytes consumed by each value in the input data
- oType:the type of output data, similar to iType
- oBytes:the number of bytes consumed by each value in the output data
- numOfRows:the number of rows in the input data
- ts: the column of timestamp corresponding to the input data
- dataOutput:the buffer for output data, total size is `oBytes * numberOfRows`
- interBuf:the buffer for intermediate result, its size is specified by `BUFSIZE` parameter when creating a UDF. It's normally used when the intermediate result is not same as the final result, it's allocated and freed by TDengine.
- tsOutput:the column of timestamps corresponding to the output data; it can be used to output timestamp together with the output data if it's not NULL
- numOfOutput:the number of rows in output data
- buf:for the state exchange between UDF and TDengine
[add_one.c](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/add_one.c) is one example of the simplest UDF implementations, i.e. one instance of the above `udfNormalFunc` template. It adds one to each value of a column passed in which can be filtered using `where` clause and outputs the result.
### Aggregate Function
Below function template can be used to define your own aggregate function.
`void abs_max_merge(char* data, int32_t numOfRows, char* dataOutput, int32_t* numOfOutput, SUdfInit* buf)`
`udfMergeFunc` is the place holder of function name, the function implemented with the above template is used to aggregate the intermediate result, only can be used in the aggregate query for STable.
Definitions of the parameters:
- data:array of output data, if interBuf is used it's an array of interBuf
- numOfRows:number of rows in `data`
- dataOutput:the buffer for output data, the size is same as that of the final result; If the result is not final, it can be put in the interBuf, i.e. `data`.
- numOfOutput:number of rows in the output data
- buf:for the state exchange between UDF and TDengine
[abs_max.c](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/abs_max.c) is an user defined aggregate function to get the maximum from the absolute value of a column.
The internal processing is that the data affected by the select statement will be divided into multiple row blocks and `udfNormalFunc`, i.e. `abs_max` in this case, is performed on each row block to generate the intermediate of each sub table, then `udfMergeFunc`, i.e. `abs_max_merge` in this case, is performed on the intermediate result of sub tables to aggregate to generate the final or intermediate result of STable. The intermediate result of STable is finally processed by `udfFinalizeFunc` to generate the final result, which contain either 0 or 1 row.
Other typical scenarios, like covariance, can also be achieved by aggregate UDF.
### Finalize
Below function template can be used to finalize the result of your own UDF, normally used when interBuf is used.
`void abs_max_finalize(char* dataOutput, char* interBuf, int* numOfOutput, SUdfInit* buf)`
`udfFinalizeFunc` is the place holder of function name, definitions of the parameter are as below:
- dataOutput:buffer for output data
- interBuf:buffer for intermediate result, can be used as input for next processing step
- numOfOutput:number of output data, can only be 0 or 1 for aggregate function
- buf:for state exchange between UDF and TDengine
## UDF Conventions
The naming of 3 kinds of UDF, i.e. udfNormalFunc, udfMergeFunc, and udfFinalizeFunc is required to have same prefix, i.e. the actual name of udfNormalFunc, which means udfNormalFunc doesn't need a suffix following the function name. While udfMergeFunc should be udfNormalFunc followed by `_merge`, udfFinalizeFunc should be udfNormalFunc followed by `_finalize`. The naming convention is part of UDF framework, TDengine follows this convention to invoke corresponding actual functions.\
According to the kind of UDF to implement, the functions that need to be implemented are different.
- Scalar function:udfNormalFunc is required
- Aggregate function:udfNormalFunc, udfMergeFunc (if query on STable) and udfFinalizeFunc are required
To be more accurate, assuming we want to implement a UDF named "foo". If the function is a scalar function, what we really need to implement is `foo`; if the function is aggregate function, we need to implement `foo`, `foo_merge`, and `foo_finalize`. For aggregate UDF, even though one of the three functions is not necessary, there must be an empty implementation.
## Compile UDF
The source code of UDF in C can't be utilized by TDengine directly. UDF can only be loaded into TDengine after compiling to dynamically linked library.
For example, the example UDF `add_one.c` mentioned in previous sections need to be compiled into DLL using below command on Linux Shell.
```bash
gcc -g -O0 -fPIC -shared add_one.c -o add_one.so
```
The generated DLL file `dd_one.so` can be used later when creating UDF. It's recommended to use GCC not older than 7.5.
## Create and Use UDF
### Create UDF
SQL command can be executed on the same hos where the generated UDF DLL resides to load the UDF DLL into TDengine, this operation can't be done through REST interface or web console. Once created, all the clients of the current TDengine can use these UDF functions in their SQL commands. UDF are stored in the management node of TDengine. The UDFs loaded in TDengine would be still available after TDengine is restarted.
When creating UDF, it needs to be clarified as either scalar function or aggregate function. If the specified type is wrong, the SQL statements using the function would fail with error. Besides, the input type and output type don't need to be same in UDF, but the input data type and output data type need to be consistent with the UDF definition.
- Create Scalar Function
```sql
CREATE FUNCTION ids(X) AS ids(Y) OUTPUTTYPE typename(Z) [ BUFSIZE B ];
```
- ids(X):the function name to be sued in SQL statement, must be consistent with the function name defined by `udfNormalFunc`
- ids(Y):the absolute path of the DLL file including the implementation of the UDF, the path needs to be quoted by single or double quotes
- typename(Z):the output data type, the value is the literal string of the type
- B:the size of intermediate buffer, in bytes; it's an optional parameter and the range is [0,512]
For example, below SQL statement can be used to create a UDF from `add_one.so`.
```sql
CREATE FUNCTION add_one AS "/home/taos/udf_example/add_one.so" OUTPUTTYPE INT;
```
- Create Aggregate Function
```sql
CREATE AGGREGATE FUNCTION ids(X) AS ids(Y) OUTPUTTYPE typename(Z) [ BUFSIZE B ];
```
- ids(X):the function name to be sued in SQL statement, must be consistent with the function name defined by `udfNormalFunc`
- ids(Y):the absolute path of the DLL file including the implementation of the UDF, the path needs to be quoted by single or double quotes
- typename(Z):the output data type, the value is the literal string of the type
- B:the size of intermediate buffer, in bytes; it's an optional parameter and the range is [0,512]
For details about how to use intermediate result, please refer to example program [demo.c](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/demo.c).
For example, below SQL statement can be used to create a UDF rom `demo.so`.
```sql
CREATE AGGREGATE FUNCTION demo AS "/home/taos/udf_example/demo.so" OUTPUTTYPE DOUBLE bufsize 14;
```
### Manage UDF
- Delete UDF
```
DROP FUNCTION ids(X);
```
- ids(X):same as that in `CREATE FUNCTION` statement
```sql
DROP FUNCTION add_one;
```
- Show Available UDF
```sql
SHOW FUNCTIONS;
```
### Use UDF
The function name specified when creating UDF can be used directly in SQL statements, just like builtin functions.
```sql
SELECT X(c) FROM table/STable;
```
The above SQL statement invokes function X for column c.
## Restrictions for UDF
In current version there are some restrictions for UDF
1. Only Linux is supported when creating and invoking UDF for both client side and server side
2. UDF can't be mixed with builtin functions
3. Only one UDF can be used in a SQL statement
4. Single column is supported as input for UDF
5. Once created successfully, UDF is persisted in MNode of TDengineUDF
6. UDF can't be created through REST interface
7. The function name used when creating UDF in SQL must be consistent with the function name defined in the DLL, i.e. the name defined by `udfNormalFunc`
8. The name name of UDF name should not conflict with any of builtin functions
## Examples
### Scalar function example [add_one](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/add_one.c)
<details>
<summary>add_one.c</summary>
```c
{{#include tests/script/sh/add_one.c}}
```
</details>
### Aggregate function example [abs_max](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/abs_max.c)
<details>
<summary>abs_max.c</summary>
```c
{{#include tests/script/sh/abs_max.c}}
```
</details>
### Example for using intermediate result [demo](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/demo.c)
<details>
<summary>demo.c</summary>
```c
{{#include tests/script/sh/demo.c}}
```
</details>
label: Developer Guide
\ No newline at end of file
```c
{{#include docs-examples/c/subscribe_demo.c}}
```
\ No newline at end of file
```csharp
{{#include docs-examples/csharp/SubscribeDemo.cs}}
```
\ No newline at end of file
```go
{{#include docs-examples/go/sub/main.go}}
```
\ No newline at end of file
```java
{{#include docs-examples/java/src/main/java/com/taos/example/SubscribeDemo.java}}
```
:::note
For now Java connector doesn't provide asynchronous subscription, but `TimerTask` can be used to achieve similar purpose.
:::
\ No newline at end of file
```js
{{#include docs-examples/node/nativeexample/subscribe_demo.js}}
```
\ No newline at end of file
```py
{{#include docs-examples/python/subscribe_demo.py}}
```
\ No newline at end of file
```rs
{{#include docs-examples/rust/nativeexample/examples/subscribe_demo.rs}}
```
\ No newline at end of file
---
title: Developer Guide
---
To develop an application using TDengine to process time-series data, we recommend taking the following steps:
1. Choose the way for connection to TDengine. No matter what programming language you use, you can always use the REST interface to access TDengine, but you can also use connectors unique to each programming language.
2. Design the data model based on your own application scenarios. Learn the [concepts](/concept/) of TDengine including "one table for one data collection point" and the "super table" concept; learn about static labels, collected metrics, and subtables. According to the data characteristics, you may decide to create one or more databases, and you should design the STable schema to fit your data.
3. Decide how to insert data. TDengine supports writing using standard SQL, but also supports schemaless writing, so that data can be written directly without creating tables manually.
4. Based on business requirements, find out what SQL query statements need to be written.
5. If you want to run real-time analysis based on time series data, including various dashboards, it is recommended that you use the TDengine continuous query feature instead of deploying complex streaming processing systems such as Spark or Flink.
6. If your application has modules that need to consume inserted data, and they need to be notified when new data is inserted, it is recommended that you use the data subscription function provided by TDengine without the need to deploy Kafka.
7. In many scenarios (such as fleet management), the application needs to obtain the latest status of each data collection point. It is recommended that you use the cache function of TDengine instead of deploying Redis separately.
8. If you find that the SQL functions of TDengine cannot meet your requirements, then you can use user-defined functions to solve the problem.
This section is organized in the order described above. For ease of understanding, TDengine provides sample code for each supported programming language for each function. If you want to learn more about the use of SQL, please read the [SQL manual](/taos-sql/). For a more in-depth understanding of the use of each connector, please read the [Connector Reference Guide](/reference/connector/). If you also want to integrate TDengine with third-party systems, such as Grafana, please refer to the [third-party tools](/third-party/).
If you encounter any problems during the development process, please click ["Submit an issue"](https://github.com/taosdata/TDengine/issues/new/choose) at the bottom of each page and submit it on GitHub right away.
```mdx-code-block
import DocCardList from '@theme/DocCardList';
import {useCurrentSidebarCategory} from '@docusaurus/theme-common';
<DocCardList items={useCurrentSidebarCategory().items}/>
```
---
title: Deployment
---
## Prerequisites
### Step 1
The FQDN of all hosts need to be setup properly, all the FQDNs need to be configured in the /etc/hosts of each host. It must be guaranteed that each FQDN can be accessed (by ping, for example) from any other hosts.
On each host command `hostname -f` can be executed to get the hostname. `ping` command can be executed on each host to check whether any other host is accessible from it. If any host is not accessible, the network configuration, like /etc/hosts or DNS configuration, need to be checked and revised to make any two hosts accessible to each other.
:::note
- The host where the client program runs also needs to configured properly for FQDN, to make sure all hosts for client or server can be accessed from any other. In other words, the hosts where the client is running are also considered as a part of the cluster.
- It's suggested to disable the firewall for all hosts in the cluster. At least TCP/UDP for port 6030~6042 need to be open if firewall is enabled.
:::
### Step 2
If any previous version of TDengine has been installed and configured on any host, the installation needs to be removed and the data needs to be cleaned up. For details about uninstalling please refer to [Install and Uninstall](/operation/pkg-install). To clean up the data, please use `rm -rf /var/lib/taos/\*` assuming the `dataDir` is configured as `/var/lib/taos`.
### Step 3
Now it's time to install TDengine on all hosts without starting `taosd`, the versions on all hosts should be same. If it's prompted to input the existing TDengine cluster, simply press carriage return to ignore it. `install.sh -e no` can also be used to disable this prompt. For details please refer to [Install and Uninstall](/operation/pkg-install).
### Step 4
Now each physical node (referred to as `dnode` hereinafter, it's abbreviation for "data node") of TDengine need to be configured properly. Please be noted that one dnode doesn't stand for one host, multiple TDengine nodes can be started on single host as long as they are configured properly without conflicting. More specifically each instance of the configuration file `taos.cfg` stands for a dnode. Assuming the first dnode of TDengine cluster is "h1.taosdata.com:6030", its `taos.cfg` is configured as following.
```c
// firstEp is the end point to connect to when any dnode starts
firstEp h1.taosdata.com:6030
// must be configured to the FQDN of the host where the dnode is launched
fqdn h1.taosdata.com
// the port used by the dnode, default is 6030
serverPort 6030
// only necessary when replica is configured to an even number
#arbitrator ha.taosdata.com:6042
```
`firstEp` and `fqdn` must be configured properly. In `taos.cfg` of all dnodes in TDengine cluster, `firstEp` must be configured to point to same address, i.e. the first dnode of the cluster. `fqdn` and `serverPort` compose the address of each node itself. If you want to start multiple TDengine dnodes on a single host, please also make sure all other configurations like `dataDir`, `logDir`, and other resources related parameters are not conflicting.
For all the dnodes in a TDengine cluster, below parameters must be configured as exactly same, any node whose configuration is different from dnodes already in the cluster can't join the cluster.
| **#** | **Parameter** | **Definition** |
| ----- | ------------------ | --------------------------------------------------------------------------------- |
| 1 | numOfMnodes | The number of management nodes in the cluster |
| 2 | mnodeEqualVnodeNum | The ratio of resource consuming of mnode to vnode |
| 3 | offlineThreshold | The threshold of dnode offline, once it's reached the dnode is considered as down |
| 4 | statusInterval | The interval by which dnode reports its status to mnode |
| 5 | arbitrator | End point of the arbitrator component in the cluster |
| 6 | timezone | Timezone |
| 7 | balance | Enable load balance automatically |
| 8 | maxTablesPerVnode | Maximum number of tables that can be created in each vnode |
| 9 | maxVgroupsPerDb | Maximum number vgroups that can be used by each DB |
:::note
Prior to version 2.0.19.0, besides the above parameters, `locale` and `charset` must be configured as same too for each dnode.
:::
## Start Cluster
### Start The First DNODE
The first dnode can be started following the instructions in [Get Started](/get-started/), for example h1.taosdata.com. Then TDengine CLI `taos` can be launched to execute command `show dnodes`, the output is as following for example:
```
Welcome to the TDengine shell from Linux, Client Version:2.0.0.0
Copyright (c) 2017 by TAOS Data, Inc. All rights reserved.
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time |
=====================================================================================
1 | h1.taos.com:6030 | 0 | 2 | ready | any | 2020-07-31 03:49:29.202 |
Query OK, 1 row(s) in set (0.006385s)
taos>
```
From the above output, it is shown that the end point of the started dnode is "h1.taos.com:6030", which is the `firstEp` of the cluster.
### Start Other DNODEs
There are a few steps necessary to add other dnodes in the cluster.
Firstly, start `taosd` as instructed in [Get Started](/get-started/), assuming it's for the second dnode. Before starting `taosd`, please making sure the configuration is correct, especially `firstEp`, `FQDN` and `serverPort`, `firstEp` must be same as the dnode shown in the section "Start First DNODE", i.e. "h1.taosdata.com" in this example.
Then, on the first dnode, use TDengine CLI `taos` to execute below command to add the end point of the dnode in the cluster. In the command "fqdn:port" should be quoted using double quotes.
```sql
CREATE DNODE "h2.taos.com:6030";
```
Then on the first dnode, execute `show dnodes` in `taos` to show whether the second dnode has been added in the cluster successfully or not.
```sql
SHOW DNODES;
```
If the status of the newly added dnode is offline, please check:
- Whether the `taosd` process is running properly or not
- In the log file `taosdlog.0` to see whether the fqdn and port are correct or not
The above process can be repeated to add more dnodes in the cluster.
---
sidebar_label: Operation
title: Manage DNODEs
---
It has been introduced that how to deploy and start a cluster from scratch. Once a cluster is ready, the dnode status in the cluster can be shown at any time, new dnode can be added to scale out the cluster, an existing dnode can be removed, even load balance can be performed manually.\
:::note
All the commands to be introduced in this chapter need to be run through TDengine CLI, sometimes it's necessary to use root privilege.
:::
## Show DNODEs
below command can be executed in TDengine CLI `taos` to list all dnodes in the cluster, including ID, end point (fqdn:port), status (ready, offline), number of vnodes, number of free vnodes, etc. It's suggested to execute this command to check after adding or removing a dnode.
```sql
SHOW DNODES;
```
Below is the example output of this command.
```
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | localhost:6030 | 9 | 8 | ready | any | 2022-04-15 08:27:09.359 | |
Query OK, 1 row(s) in set (0.008298s)
```
## Show VGROUPs
To utilize system resources efficiently and provide scalability, data sharding is required. The data of each database is divided into multiple shards and stored in multiple vnodes. These vnodes may be located in different dnodes, scaling out can be achieved by adding more vnodes from more dnodes. Each vnode can only be used for a single DB, but one DB can have multiple vnodes. The allocation of vnode is scheduled automatically by mnode according to system resources of the dnodes.
Launch TDengine CLI `taos` and execute below command:
```sql
USE SOME_DATABASE;
SHOW VGROUPS;
```
The example output is as below:
```
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | localhost:6030 | 9 | 8 | ready | any | 2022-04-15 08:27:09.359 | |
Query OK, 1 row(s) in set (0.008298s)
taos> use db;
Database changed.
taos> show vgroups;
vgId | tables | status | onlines | v1_dnode | v1_status | compacting |
==========================================================================================
14 | 38000 | ready | 1 | 1 | master | 0 |
15 | 38000 | ready | 1 | 1 | master | 0 |
16 | 38000 | ready | 1 | 1 | master | 0 |
17 | 38000 | ready | 1 | 1 | master | 0 |
18 | 37001 | ready | 1 | 1 | master | 0 |
19 | 37000 | ready | 1 | 1 | master | 0 |
20 | 37000 | ready | 1 | 1 | master | 0 |
21 | 37000 | ready | 1 | 1 | master | 0 |
Query OK, 8 row(s) in set (0.001154s)
```
## Add DNODE
Launch TDengine CLI `taos` and execute the command below to add the end point of a new dnode into the EPI (end point) list of the cluster. "fqdn:port" must be quoted using double quotes.
```sql
CREATE DNODE "fqdn:port";
```
The example output is as below:
```
taos> create dnode "localhost:7030";
Query OK, 0 of 0 row(s) in database (0.008203s)
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | localhost:6030 | 9 | 8 | ready | any | 2022-04-15 08:27:09.359 | |
2 | localhost:7030 | 0 | 0 | offline | any | 2022-04-19 08:11:42.158 | status not received |
Query OK, 2 row(s) in set (0.001017s)
```
It can be seen that the status of the new dnode is "offline", once the dnode is started and connects the firstEp of the cluster, execute the command again and get below example output, from which it can be seen that two dnodes are both in "ready" status.
```
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | localhost:6030 | 3 | 8 | ready | any | 2022-04-15 08:27:09.359 | |
2 | localhost:7030 | 6 | 8 | ready | any | 2022-04-19 08:14:59.165 | |
Query OK, 2 row(s) in set (0.001316s)
```
## Drop DNODE
Launch TDengine CLI `taos` and execute the command below to drop or remove a dnode from the cluster. In the command, `dnodeId` can be gotten from `show dnodes`.
```sql
DROP DNODE "fqdn:port";
```
or
```sql
DROP DNODE dnodeId;
```
The example output is as below:
```
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | localhost:6030 | 9 | 8 | ready | any | 2022-04-15 08:27:09.359 | |
2 | localhost:7030 | 0 | 0 | offline | any | 2022-04-19 08:11:42.158 | status not received |
Query OK, 2 row(s) in set (0.001017s)
taos> drop dnode 2;
Query OK, 0 of 0 row(s) in database (0.000518s)
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | localhost:6030 | 9 | 8 | ready | any | 2022-04-15 08:27:09.359 | |
Query OK, 1 row(s) in set (0.001137s)
```
In the above example, when `show dnodes` is executed the first time, two dnodes are shown. Then `drop dnode 2` is executed, after that from the output of executing `show dnodes` again it can be seen that only the dnode with ID 1 is still in the cluster.
:::note
- Once a dnode is dropped, it can't rejoin the cluster. To rejoin, the dnode needs to deployed again after cleaning up the data directory. Normally, before dropping a dnode, the data belonging to the dnode needs to be migrated to other place.
- Please be noted that `drop dnode` is different from stopping `taosd` process. `drop dnode` just removes the dnode out of TDengine cluster. Only after a dnode is dropped, can the corresponding `taosd` process be stopped.
- Once a dnode is dropped, other dnodes in the cluster will be notified of the drop and will not accept the request from the dropped dnode.
- dnodeID is allocated automatically and can't be interfered manually. dnodeID is generated in ascending order without duplication.
:::
## Move VNODE
A vnode can be manually moved from one dnode to another.
Launch TDengine CLI `taos` and execute below command:
```sql
ALTER DNODE <source-dnodeId> BALANCE "VNODE:<vgId>-DNODE:<dest-dnodeId>";
```
In the above command, `source-dnodeId` is the original dnodeId where the vnode resides, `dest-dnodeId` specifies the target dnode. vgId (vgroup ID) can be shown by `SHOW VGROUPS `.
Firstly `show vgroups` is executed to show the vgroup distribution.
```
taos> show vgroups;
vgId | tables | status | onlines | v1_dnode | v1_status | compacting |
==========================================================================================
14 | 38000 | ready | 1 | 3 | master | 0 |
15 | 38000 | ready | 1 | 3 | master | 0 |
16 | 38000 | ready | 1 | 3 | master | 0 |
17 | 38000 | ready | 1 | 3 | master | 0 |
18 | 37001 | ready | 1 | 3 | master | 0 |
19 | 37000 | ready | 1 | 1 | master | 0 |
20 | 37000 | ready | 1 | 1 | master | 0 |
21 | 37000 | ready | 1 | 1 | master | 0 |
Query OK, 8 row(s) in set (0.001314s)
```
It can be seen that there are 5 vgroups in dnode 3 and 3 vgroups in node 1, now we want to move vgId 18 from dnode 3 to dnode 1. Execute below command in `taos`
```
taos> alter dnode 3 balance "vnode:18-dnode:1";
DB error: Balance already enabled (0.00755
```
However, the operation fails with error message show above, which means automatic load balancing has been enabled in the current database so manual load balance can't be performed.
Shutdown the cluster, configure `balance` parameter in all the dnodes to 0, then restart the cluster, and execute `alter dnode` and `show vgroups` as below.
```
taos> alter dnode 3 balance "vnode:18-dnode:1";
Query OK, 0 row(s) in set (0.000575s)
taos> show vgroups;
vgId | tables | status | onlines | v1_dnode | v1_status | v2_dnode | v2_status | compacting |
=================================================================================================================
14 | 38000 | ready | 1 | 3 | master | 0 | NULL | 0 |
15 | 38000 | ready | 1 | 3 | master | 0 | NULL | 0 |
16 | 38000 | ready | 1 | 3 | master | 0 | NULL | 0 |
17 | 38000 | ready | 1 | 3 | master | 0 | NULL | 0 |
18 | 37001 | ready | 2 | 1 | slave | 3 | master | 0 |
19 | 37000 | ready | 1 | 1 | master | 0 | NULL | 0 |
20 | 37000 | ready | 1 | 1 | master | 0 | NULL | 0 |
21 | 37000 | ready | 1 | 1 | master | 0 | NULL | 0 |
Query OK, 8 row(s) in set (0.001242s)
```
It can be seen from above output that vgId 18 has been moved from dnode 3 to dnode 1.
:::note
- Manual load balancing can only be performed when the automatic load balancing is disabled, i.e. `balance` is set to 0.
- Only vnode in normal state, i.e. master or slave, can be moved. vnode can't moved when its in status offline, unsynced or syncing.
- Before moving a vnode, it's necessary to make sure the target dnode has enough resources: CPU, memory and disk.
:::
---
sidebar_label: HA & LB
title: High Availability and Load Balancing
---
## High Availability of Vnode
High availability of vnode and mnode can be achieved through replicas in TDengine.
The number of vnodes is associated with each DB, there can be multiple DBs in a TDengine cluster. For the purpose of operation, different number of replicas can be configured properly for each DB. When creating a database, the parameter `replica` is used to specify the number of replicas, the default value is 1. With single replica, the high availability of the system can't be guaranteed. Whenever one node is down, data service would be unavailable. The number of dnodes in the cluster must NOT be lower than the number of replicas set for any DB, otherwise the `create table` operation would fail with error "more dnodes are needed". Below SQL statement is used to create a database named as "demo" with 3 replicas.
```sql
CREATE DATABASE demo replica 3;
```
The data in a DB is divided into multiple shards and stored in multiple vgroups. The number of vnodes in each group is determined by the number of replicas set for the DB. The vnodes in each vgroups store exactly same data. For the purpose of high availability, the vnodes in a vgroup must be located in different dnodes on different hosts. As long as over half of the vnodes in a vgroup are in online state, the vgroup is able to serve data access. Otherwise the vgroup can't handle any data access for reading or inserting data.
There may be data for multiple DBs in a dnode. Once a dnode is down, multiple DBs may be affected. However, it's hard to say the cluster is guaranteed to work properly as long as over half of dnodes are online because vnodes are introduced and there may be complex mapping between vnodes and dnodes.
## High Availability of Mnode
Each TDengine cluster is managed by `mnode`, which is a module of `taosd`. For the high availability of mnode, multiple mnodes can be configured using system parameter `numOfMNodes`, the valid time range is [1,3]. To make sure the data consistency between mnodes, the data replication between mnodes is performed in synchronous way.
There may be multiple dnodes in a cluster, but only one mnode can be started in each dnode. Which one or ones of the dnodes will be designated as mnodes is automatically determined by TDengine according to the cluster configuration and system resources. Command `show mnodes` can be executed in TDengine `taos` to show the mnodes in the cluster.
```sql
SHOW MNODES;
```
The end point and role/status (master, slave, unsynced, or offline) of all mnodes can be shown by the above command. When the first dnode is started in a cluster, there must be one mnode in this dnode, because there must be at least one mnode otherwise the cluster doesn't work. If `numOfMNodes` is configured to 2, another mnode will be started when the second dnode is launched.
For the high availability of mnode, `numOfMnodes` needs to be configured to 2 or a higher value. Because the data consistency between mnodes must be guaranteed, the replica confirmation parameter `quorum` is set to 2 automatically if `numOfMNodes` is set to 2 or higher.
:::note
If high availability is important for your system, both vnode and mnode must be configured to have multiple replicas. How to configure for them are different and have been described.
:::
## Load Balance
Load balance will be triggered in 3 cades without manual intervention.
- When a new dnode is joined in the cluster, automatic load balancing may be triggered, some data from some dnodes may be transferred to the new dnode automatically.
- When a dnode is removed from the cluster, the data from this dnode will be transferred to other dnodes automatically.
- When a dnode is too hot, i.e. too much data has been stored in it, automatic load balancing may be triggered to migrate some vnodes from this dnode to other dnodes.
- :::tip
Automatic load balancing is controlled by parameter `balance`, 0 means disabled and 1 means enabled.
:::
## Dnode Offline
When a dnode is offline, it can be detected by the TDengine cluster. There are two cases:
- The dnode becomes online again before the threshold configured in `offlineThreshold` is reached, it is still in the cluster and data replication is started automatically. The dnode can work properly after the data syncup is finished.
- If the dnode has been offline over the threshold configured in `offlineThreshold` in `taos.cfg`, the dnode will be removed from the cluster automatically. System alert will be generated and automatic load balancing will be triggered too if `balance` is set to 1. When the removed dnode is restarted and becomes online, it will not be joined in the cluster automatically, it can only be joined manually by the system operator.
:::note
If all the vnodes in a vgroup (or mnodes in mnode group) are in offline or unsynced status, the master node can only be voted after all the vnodes or mnodes in the group become online and can exchange status, then the vgroup (or mnode group) is able to provide service.
:::
## Arbitrator
If the number of replicas is set to an even number like 2, when half of the vnodes in a vgroup don't work master node can't be voted. Similar case is also applicable to mnode if the number of mnodes is set to an even number like 2.
To resolve this problem, a new arbitrator component named `tarbitrator`, abbreviated for TDengine Arbitrator, was introduced. Arbitrator simulates a vnode or mnode but it's only responsible for network communication and doesn't handle any actual data access. With Arbitrator, any vgroup or mnode group can be considered as having number of member nodes and master node can be selected.
Normally, it's suggested to configure replica number of each DB or system parameter `numOfMNodes` to an odd number. However, if a user is very sensitive to storage space, replica number of 2 plus arbitrator component can be used to achieve both lower cost of storage space and high availability.
Arbitrator component is installed with the server package. For details about how to install, please refer to [Install](/operation/pkg-install). The `-p` parameter of `tarbitrator` can be used to specify the port on which it provides service.
In the configuration file `taos.cfg` of each dnode, parameter `arbitrator` needs to be configured to the end point of the `tarbitrator` process. arbitrator component will be used automatically if the replica is configured to an even number and will be ignored if the replica is configured to an odd number.
Arbitrator can be shown by executing command in TDengine CLI `taos` with its role shown as "arb".
```sql
SHOW DNODES;
```
---
title: Cluster
keywords: ["cluster", "high availability", "load balance", "scale out"]
---
TDengine has a native distributed design and provides the ability to scale out. A few of nodes can form a TDengine cluster. If you need to get higher processing power, you just need to add more nodes into the cluster. TDengine uses virtual node technology to virtualize a node into multiple virtual nodes to achieve load balancing. At the same time, TDengine can group virtual nodes on different nodes into virtual node groups, and use the replication mechanism to ensure the high availability of the system. The cluster feature of TDengine is completely open source.
This chapter mainly introduces cluster deployment, maintenance, and how to achieve high availability and load balancing.
```mdx-code-block
import DocCardList from '@theme/DocCardList';
import {useCurrentSidebarCategory} from '@docusaurus/theme-common';
<DocCardList items={useCurrentSidebarCategory().items}/>
```
---
title: Data Types
description: "The data types supported by TDengine include timestamp, float, JSON, etc"
---
When using TDengine to store and query data, the most important part of the data is timestamp. Timestamp must be specified when creating and inserting data rows or querying data, timestamp must follow below rules:
- the format must be `YYYY-MM-DD HH:mm:ss.MS`, the default time precision is millisecond (ms), for example `2017-08-12 18:25:58.128`
- internal function `now` can be used to get the current timestamp of the client side
- the current timestamp of the client side is applied when `now` is used to insert data
- Epoch Time:timestamp can also be a long integer number, which means the number of seconds, milliseconds or nanoseconds, depending on the time precision, from 1970-01-01 00:00:00.000 (UTC/GMT)
- timestamp can be applied with add/subtract operation, for example `now-2h` means 2 hours back from the time at which query is executed,the unit can be b(nanosecond), u(microsecond), a(millisecond), s(second), m(minute), h(hour), d(day), w(week.。 So `select * from t1 where ts > now-2w and ts <= now-1w` means the data between two weeks ago and one week ago. The time unit can also be n (calendar month) or y (calendar year) when specifying the time window for down sampling operation.
Time precision in TDengine can be set by the `PRECISION` parameter when executing `CREATE DATABASE`, like below, the default time precision is millisecond.
```sql
CREATE DATABASE db_name PRECISION 'ns';
```
In TDengine, below data types can be used when specifying a column or tag.
| # | **type** | **Bytes** | **Description** |
| --- | :-------: | --------- | ------------------------- |
| 1 | TIMESTAMP | 8 | Default precision is millisecond, microsecond and nanosecond are also supported |
| 2 | INT | 4 | Integer, the value range is [-2^31+1, 2^31-1], while -2^31 is treated as NULL |
| 3 | BIGINT | 8 | Long integer, the value range is [-2^63+1, 2^63-1], while -2^63 is treated as NULL |
| 4 | FLOAT | 4 | Floating point number, the effective number of digits is 6-7, the value range is [-3.4E38, 3.4E38] |
| 5 | DOUBLE | 8 | double precision floating point number, the effective number of digits is 15-16, the value range is [-1.7E308, 1.7E308] |
| 6 | BINARY | User Defined | Single-byte string for ASCII visible characters. Length must be specified when defining a column or tag of binary type. The string length can be up to 16374 bytes. The string value must be quoted with single quotes. The literal single quote inside the string must be preceded with back slash like `\'` |
| 7 | SMALLINT | 2 | Short integer, the value range is [-32767, 32767], while -32768 is treated as NULL |
| 8 | TINYINT | 1 | Single-byte integer, the value range is [-127, 127], while -128 is treated as NULL |
| 9 | BOOL | 1 | Bool, the value range is {true, false} |
| 10 | NCHAR | User Defined| Multiple-Byte string that can include like Chinese characters. Each character of NCHAR type consumes 4 bytes storage. The string value should be quoted with single quotes. Literal single quote inside the string must be preceded with backslash, like `\’`. The length must be specified when defining a column or tag of NCHAR type, for example nchar(10) means it can store at most 10 characters of nchar type and will consume fixed storage of 40 bytes. Error will be reported the string value exceeds the length defined. |
| 11 | JSON | | json type can only be used on tag, a tag of json type is excluded with any other tags of any other type |
:::tip
TDengine is case insensitive and treats any characters in the sql command as lower case by default, case sensitive strings must be quoted with single quotes.
:::
:::note
Only ASCII visible characters are suggested to be used in a column or tag of BINARY type. Multiple-byte characters must be stored in NCHAR type.
:::
:::note
Numeric values in SQL statements will be determined as integer or float type according to whether there is decimal point or whether scientific notation is used, so attention must be paid to avoid overflow. For example, 9999999999999999999 will be considered as overflow because it exceeds the upper limit of long integer, but 9999999999999999999.0 will be considered as a legal float number.
:::
---
sidebar_label: Database
title: Database
description: "create and drop database, show or change database parameters"
---
## Create Datable
```
CREATE DATABASE [IF NOT EXISTS] db_name [KEEP keep] [DAYS days] [UPDATE 1];
```
:::info
1. KEEP specifies the number of days for which the data in the database to be created will be kept, the default value is 3650 days, i.e. 10 years. The data will be deleted automatically once its age exceeds this threshold.
2. UPDATE specifies whether the data can be updated and how the data can be updated.
1. UPDATE set to 0 means update operation is not allowed, the data with an existing timestamp will be dropped silently.
2. UPDATE set to 1 means the whole row will be updated, the columns for which no value is specified will be set to NULL
3. UPDATE set to 2 means updating a part of columns for a row is allowed, the columns for which no value is specified will be kept as no change
3. The maximum length of database name is 33 bytes.
4. The maximum length of a SQL statement is 65,480 bytes.
5. Below are the parameters that can be used when creating a database
- cache: [Description](/reference/config/#cache)
- blocks: [Description](/reference/config/#blocks)
- days: [Description](/reference/config/#days)
- keep: [Description](/reference/config/#keep)
- minRows: [Description](/reference/config/#minrows)
- maxRows: [Description](/reference/config/#maxrows)
- wal: [Description](/reference/config/#wallevel)
- fsync: [Description](/reference/config/#fsync)
- update: [Description](/reference/config/#update)
- cacheLast: [Description](/reference/config/#cachelast)
- replica: [Description](/reference/config/#replica)
- quorum: [Description](/reference/config/#quorum)
- maxVgroupsPerDb: [Description](/reference/config/#maxvgroupsperdb)
- comp: [Description](/reference/config/#comp)
- precision: [Description](reference/config/#precision)
6. Please be noted that all of the parameters mentioned in this section can be configured in configuration file `taosd.cfg` at server side and used by default, can be override if they are specified in `create database` statement.
:::
## Show Current Configuration
```
SHOW VARIABLES;
```
## Specify The Database In Use
```
USE db_name;
```
:::note
This way is not applicable when using a REST connection
:::
## Drop Database
```
DROP DATABASE [IF EXISTS] db_name;
```
:::note
All data in the database will be deleted too. This command must be used with caution.
:::
## Change Database Configuration
Some examples are shown below to demonstrate how to change the configuration of a database. Please be noted that some configuration parameters can be changed after the database is created, but some others can't, for details of the configuration parameters of database please refer to [Configuration Parameters](/reference/config/).
```
ALTER DATABASE db_name COMP 2;
```
COMP parameter specifies whether the data is compressed and how the data is compressed.
```
ALTER DATABASE db_name REPLICA 2;
```
REPLICA parameter specifies the number of replications of the database.
```
ALTER DATABASE db_name KEEP 365;
```
KEEP parameter specifies the number of days for which the data will be kept.
```
ALTER DATABASE db_name QUORUM 2;
```
QUORUM parameter specifies the necessary number of confirmations to determine whether the data is written successfully.
```
ALTER DATABASE db_name BLOCKS 100;
```
BLOCKS parameter specifies the number of memory blocks used by each VNODE.
```
ALTER DATABASE db_name CACHELAST 0;
```
CACHELAST parameter specifies whether and how the latest data of a sub table is cached.
:::tip
The above parameters can be changed using `ALTER DATABASE` command without restarting. For more details of all configuration parameters please refer to [Configuration Parameters](/reference/config/).
:::
## Show All Databases
```
SHOW DATABASES;
```
## Show The Create Statement of A Database
```
SHOW CREATE DATABASE db_name;
```
This command is useful when migrating the data from one TDengine cluster to another one. Firstly this command can be used to get the CREATE statement, which in turn can be used in another TDengine to create an exactly same database.
---
sidebar_label: Table
title: Table
description: create super table, normal table and sub table, drop tables and change tables
---
## Create Table
```
CREATE TABLE [IF NOT EXISTS] tb_name (timestamp_field_name TIMESTAMP, field1_name data_type1 [, field2_name data_type2 ...]);
```
:::info
1. The first column of a table must be in TIMESTAMP type, and it will be set as primary key automatically
2. The maximum length of table name is 192 bytes.
3. The maximum length of each row is 16k bytes, please be notes that the extra 2 bytes used by each BINARY/NCHAR column are also counted in.
4. The name of sub-table can only be consisted of English characters, digits and underscore, and can't be started with digit. Table names are case insensitive.
5. The maximum length in bytes must be specified when using BINARY or NCHAR type.
6. Escape character "\`" can be used to avoid the conflict between table names and reserved keywords, above rules will be bypassed when using escape character on table names, but the upper limit for name length is still valid. The table names specified using escape character are case sensitive. Only ASCII visible characters can be used with escape character.
For example \`aBc\` and \`abc\` are different table names but `abc` and `aBc` are same table names because they are both converted to `abc` internally.
:::
### Create Subtable Using STable As Template
```
CREATE TABLE [IF NOT EXISTS] tb_name USING stb_name TAGS (tag_value1, ...);
```
The above command creates a subtable using the specified super table as template and the specified tab values.
### Create Subtable Using STable As Template With A Part of Tags
```
CREATE TABLE [IF NOT EXISTS] tb_name USING stb_name (tag_name1, ...) TAGS (tag_value1, ...);
```
The tags for which no value is specified will be set to NULL.
### Create Tables in Batch
```
CREATE TABLE [IF NOT EXISTS] tb_name1 USING stb_name TAGS (tag_value1, ...) [IF NOT EXISTS] tb_name2 USING stb_name TAGS (tag_value2, ...) ...;
```
This way can be used to create a lot of tables in a single SQL statement to accelerate the speed of the creating tables.
:::info
- Creating tables in batch must use super table as template.
- The length of single statement is suggested to be between 1,000 and 3,000 bytes for best performance.
:::
## Drop Tables
```
DROP TABLE [IF EXISTS] tb_name;
```
## Show All Tables In Current Database
```
SHOW TABLES [LIKE tb_name_wildcard];
```
## Show Create Statement of A Table
```
SHOW CREATE TABLE tb_name;
```
This way is useful when migrating the data in one TDengine cluster to another one because it can be used to create exactly same tables in the target database.
## Show Table Definition
```
DESCRIBE tb_name;
```
## Change Table Definition
### Add A Column
```
ALTER TABLE tb_name ADD COLUMN field_name data_type;
```
:::info
1. The maximum number of columns is 4096, the minimum number of columns is 2.
2. The maximum length of column name is 64 bytes.
:::
### Remove A Column
```
ALTER TABLE tb_name DROP COLUMN field_name;
```
:::note
If a table is created using a super table as template, the table definition can only be changed on the corresponding super table, but the change will be automatically applied to all the sub tables created using this super table as template. For tables created in normal way, the table definition can be changed directly on the table.
:::
### Change Column Length
```
ALTER TABLE tb_name MODIFY COLUMN field_name data_type(length);
```
The the type of a column is variable length, like BINARY or NCHAR, this way can be used to change (or increase) the length of the column.
:::note
If a table is created using a super table as template, the table definition can only be changed on the corresponding super table, but the change will be automatically applied to all the sub tables created using this super table as template. For tables created in normal way, the table definition can be changed directly on the table.
:::
### Change Tag Value Of Sub Table
```
ALTER TABLE tb_name SET TAG tag_name=new_tag_value;
```
This command can be used to change the tag value if the table is created using a super table as template.
---
sidebar_label: STable
title: Super Table
---
:::note
Keyword `STable`, abbreviated for super table, is supported since version 2.0.15.
:::
## Crate STable
```
CREATE STable [IF NOT EXISTS] stb_name (timestamp_field_name TIMESTAMP, field1_name data_type1 [, field2_name data_type2 ...]) TAGS (tag1_name tag_type1, tag2_name tag_type2 [, tag3_name tag_type3]);
```
The SQL statement of creating STable is similar to that of creating table, but a special column named as `TAGS` must be specified with the names and types of the tags.
:::info
1. The tag types specified in TAGS should NOT be timestamp. Since 2.1.3.0 timestamp type can be used in TAGS column, but its value must be fixed and arithmetic operation can't be applied on it.
2. The tag names specified in TAGS should NOT be same as other columns.
3. The tag names specified in TAGS should NOT be same as any reserved keywords.(Please refer to [keywords](/taos-sql/keywords/)
4. The maximum number of tags specified in TAGS is 128, but there must be at least one tag, and the total length of all tag columns should NOT exceed 16KB.
:::
## Drop STable
```
DROP STable [IF EXISTS] stb_name;
```
All the sub-tables created using the deleted STable will be deleted automatically.
## Show All STables
```
SHOW STableS [LIKE tb_name_wildcard];
```
This command can be used to display the information of all STables in the current database, including name, creation time, number of columns, number of tags, number of tables created using this STable.
## Show The Create Statement of A STable
```
SHOW CREATE STable stb_name;
```
This command is useful in migrating data from one TDengine cluster to another one because it can be used to create an exactly same STable in the target database.
## Get STable Definition
```
DESCRIBE stb_name;
```
## Change Columns Of STable
### Add A Column
```
ALTER STable stb_name ADD COLUMN field_name data_type;
```
### Remove A Column
```
ALTER STable stb_name DROP COLUMN field_name;
```
### Change Column Length
```
ALTER STable stb_name MODIFY COLUMN field_name data_type(length);
```
This command can be used to change (or increase, more specifically) the length of a column of variable length types, like BINARY or NCHAR.
## Change Tags of A STable
### Add A Tag
```
ALTER STable stb_name ADD TAG new_tag_name tag_type;
```
This command is used to add a new tag for a STable and specify the tag type.
### Remove A Tag
```
ALTER STable stb_name DROP TAG tag_name;
```
The tag will be removed automatically from all the sub tables crated using the super table as template once a tag is removed from a super table.
### Change A Tag
```
ALTER STable stb_name CHANGE TAG old_tag_name new_tag_name;
```
The tag name will be changed automatically from all the sub tables crated using the super table as template once a tag name is changed for a super table.
### Change Tag Length
```
ALTER STable stb_name MODIFY TAG tag_name data_type(length);
```
This command can be used to change (or increase, more specifically) the length of a tag of variable length types, like BINARY or NCHAR.
:::note
Changing tag value can be applied to only sub tables. All other tag operations, like add tag, remove tag, however, can be applied to only STable. If a new tag is added for a STable, the tag will be added with NULL value for all its sub tables.
:::
---
title: Insert
---
## Syntax
```sql
INSERT INTO
tb_name
[USING stb_name [(tag1_name, ...)] TAGS (tag1_value, ...)]
[(field1_name, ...)]
VALUES (field1_value, ...) [(field1_value2, ...) ...] | FILE csv_file_path
[tb2_name
[USING stb_name [(tag1_name, ...)] TAGS (tag1_value, ...)]
[(field1_name, ...)]
VALUES (field1_value, ...) [(field1_value2, ...) ...] | FILE csv_file_path
...];
```
## Insert Single or Multiple Rows
Single row or multiple rows specified with VALUES can be inserted into a specific table. For example
Single row is inserted using below statement.
```sq;
INSERT INTO d1001 VALUES (NOW, 10.2, 219, 0.32);
```
Double rows can be inserted using below statement.
```sql
INSERT INTO d1001 VALUES ('2021-07-13 14:06:32.272', 10.2, 219, 0.32) (1626164208000, 10.15, 217, 0.33);
```
:::note
1. In the second example above, different formats are used in the two rows to be inserted. In the first row, the timestamp format is a date and time string, which is interpreted from the string value only. In the second row, the timestamp format is a long integer, which will be interpreted based on the database time precision.
2. When trying to insert multiple rows in single statement, only the timestamp of one row can be set as NOW, otherwise there will be duplicate timestamps among the rows and the result may be out of expectation because NOW will be interpreted as the time when the statement is executed.
3. The oldest timestamp that is allowed is subtracting the KEEP parameter from current time.
4. The newest timestamp that is allowed is adding the DAYS parameter to current time.
:::
## Insert Into Specific Columns
Data can be inserted into specific columns, either single row or multiple row, while other columns will be inserted as NULL value.
```
INSERT INTO d1001 (ts, current, phase) VALUES ('2021-07-13 14:06:33.196', 10.27, 0.31);
```
:::info
If no columns are explicitly specified, all the columns must be provided with values, this is called "all column mode". The insert performance of all column mode is much better than specifying a part of columns, so it's encouraged to use "all column mode" while providing NULL value explicitly for the columns for which no actual value can be provided.
:::
## Insert Into Multiple Tables
One or multiple rows can be inserted into multiple tables in single SQL statement, with or without specifying specific columns.
```sql
INSERT INTO d1001 VALUES ('2021-07-13 14:06:34.630', 10.2, 219, 0.32) ('2021-07-13 14:06:35.779', 10.15, 217, 0.33)
d1002 (ts, current, phase) VALUES ('2021-07-13 14:06:34.255', 10.27, 0.31;
```
## Automatically Create Table When Inserting
If it's not sure whether the table already exists, the table can be created automatically while inserting using below SQL statement. To use this functionality, a STable must be used as template and tag values must be provided.
```sql
INSERT INTO d21001 USING meters TAGS ('Beijing.Chaoyang', 2) VALUES ('2021-07-13 14:06:32.272', 10.2, 219, 0.32);
```
It's not necessary to provide values for all tag when creating tables automatically, the tags without values provided will be set to NULL.
```sql
INSERT INTO d21001 USING meters (groupId) TAGS (2) VALUES ('2021-07-13 14:06:33.196', 10.15, 217, 0.33);
```
Multiple rows can also be inserted into same table in single SQL statement using this way.
```sql
INSERT INTO d21001 USING meters TAGS ('Beijing.Chaoyang', 2) VALUES ('2021-07-13 14:06:34.630', 10.2, 219, 0.32) ('2021-07-13 14:06:35.779', 10.15, 217, 0.33)
d21002 USING meters (groupId) TAGS (2) VALUES ('2021-07-13 14:06:34.255', 10.15, 217, 0.33)
d21003 USING meters (groupId) TAGS (2) (ts, current, phase) VALUES ('2021-07-13 14:06:34.255', 10.27, 0.31);
```
:::info
Prior to version 2.0.20.5, when using `INSERT` to create table automatically and specify the columns, the column names must follow the table name immediately. From version 2.0.20.5, the column names can follow the table name immediately, also can be put between `TAGS` and `VALUES`. In same SQL statement, however, these two ways of specifying column names can't be mixed.
:::
## Insert Rows From A File
Besides using `VALUES` to insert one or multiple rows, the data to be inserted can also be prepared in a CSV file with comma as separator and each field value quoted by single quotes. Table definition is not required in the CSV file. For example, if file "/tmp/csvfile.csv" contains below data:
```
'2021-07-13 14:07:34.630', '10.2', '219', '0.32'
'2021-07-13 14:07:35.779', '10.15', '217', '0.33'
```
Then data in this file can be inserted by below SQL statement:
```sql
INSERT INTO d1001 FILE '/tmp/csvfile.csv';
```
## Create Tables Automatically and Insert Rows From File
From version 2.1.5.0, tables can be automatically created using a super table as template when inserting data from a CSV file, Like below:
```sql
INSERT INTO d21001 USING meters TAGS ('Beijing.Chaoyang', 2) FILE '/tmp/csvfile.csv';
```
Multiple tables can be automatically created and inserted in single SQL statement, like below:
```sql
INSERT INTO d21001 USING meters TAGS ('Beijing.Chaoyang', 2) FILE '/tmp/csvfile_21001.csv'
d21002 USING meters (groupId) TAGS (2) FILE '/tmp/csvfile_21002.csv';
```
## More About Insert
For SQL statement like `insert`, stream parsing strategy is applied. That means before an error is found and the execution is aborted, the part prior to the error point has already been executed. Below is an experiment to help understand the behavior.
Firstly, a super table is created.
```sql
CREATE TABLE meters(ts TIMESTAMP, current FLOAT, voltage INT, phase FLOAT) TAGS(location BINARY(30), groupId INT);
```
It can be proved that the super table has been created by `SHOW STableS`, but no table exists by `SHOW TABLES`.
```
taos> SHOW STableS;
name | created_time | columns | tags | tables |
============================================================================================
meters | 2020-08-06 17:50:27.831 | 4 | 2 | 0 |
Query OK, 1 row(s) in set (0.001029s)
taos> SHOW TABLES;
Query OK, 0 row(s) in set (0.000946s)
```
Then, try to create table d1001 automatically when inserting data into it.
```sql
INSERT INTO d1001 USING meters TAGS('Beijing.Chaoyang', 2) VALUES('a');
```
The output shows the value to be inserted is invalid. But `SHOW TABLES` proves that the table has been created automatically by the `INSERT` statement.
```
DB error: invalid SQL: 'a' (invalid timestamp) (0.039494s)
taos> SHOW TABLES;
table_name | created_time | columns | STable_name |
======================================================================================================
d1001 | 2020-08-06 17:52:02.097 | 4 | meters |
Query OK, 1 row(s) in set (0.001091s)
```
From the above experiment, we can see that even though the value to be inserted is invalid but the table is still created.
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册