未验证 提交 03105eb7 编写于 作者: W wade zhang 提交者: GitHub

Merge pull request #12419 from taosdata/docs/TD-15485

docs: English version of 04-develop chapter
```c title="原生连接" ```c title="Native Connection"
{{#include docs-examples/c/connect_example.c}} {{#include docs-examples/c/connect_example.c}}
``` ```
```csharp title="原生连接" ```csharp title="Native Connection"
{{#include docs-examples/csharp/ConnectExample.cs}} {{#include docs-examples/csharp/ConnectExample.cs}}
``` ```
:::info :::info
C# 连接器目前只支持原生连接。 C# connector supports only native connection for now.
::: :::
\ No newline at end of file
#### 使用数据库访问统一接口 #### Unified Database Access Interface
```go title="原生连接" ```go title="Native Connection"
{{#include docs-examples/go/connect/cgoexample/main.go}} {{#include docs-examples/go/connect/cgoexample/main.go}}
``` ```
```go title="REST 连接" ```go title="REST Connection"
{{#include docs-examples/go/connect/restexample/main.go}} {{#include docs-examples/go/connect/restexample/main.go}}
``` ```
#### 使用高级封装 #### Advanced Features
也可以使用 driver-go 的 af 包建立连接。这个模块封装了 TDengine 的高级功能, 如:参数绑定、订阅等。 The af package of driver-go can also be used to establish connection, with this way some advanced features of TDengine, like parameter binding and subscription, can be used.
```go title="使用 af 包建立原生连接" ```go title="Establish native connection using af package"
{{#include docs-examples/go/connect/afconn/main.go}} {{#include docs-examples/go/connect/afconn/main.go}}
``` ```
......
```java title="原生连接" ```java title="Native Connection"
{{#include docs-examples/java/src/main/java/com/taos/example/JNIConnectExample.java}} {{#include docs-examples/java/src/main/java/com/taos/example/JNIConnectExample.java}}
``` ```
```java title="REST 连接" ```java title="REST Connection"
{{#include docs-examples/java/src/main/java/com/taos/example/RESTConnectExample.java:main}} {{#include docs-examples/java/src/main/java/com/taos/example/RESTConnectExample.java:main}}
``` ```
使用REST 连接时,如果查询数据量比较大,还可开启批量拉取功能。 When using REST connection, the feature of bulk pulling can be enabled if the size of resulting data set is huge.
```java title="开启批量拉取功能" {4}
```java title="Enable Bulk Pulling" {4}
{{#include docs-examples/java/src/main/java/com/taos/example/WSConnectExample.java:main}} {{#include docs-examples/java/src/main/java/com/taos/example/WSConnectExample.java:main}}
``` ```
更多连接参数配置,参考[Java 连接器](/reference/connector/java) More configuration about connection,please refer to [Java Connector](/reference/connector/java)
\ No newline at end of file \ No newline at end of file
```js title="原生连接" ```js title="Native Connection"
{{#include docs-examples/node/nativeexample/connect.js}} {{#include docs-examples/node/nativeexample/connect.js}}
``` ```
```js title="REST 连接" ```js title="REST Connection"
{{#include docs-examples/node/restexample/connect.js}} {{#include docs-examples/node/restexample/connect.js}}
``` ```
\ No newline at end of file
```python title="原生连接" ```python title="Native Connection"
{{#include docs-examples/python/connect_exmaple.py}} {{#include docs-examples/python/connect_exmaple.py}}
``` ```
\ No newline at end of file
```r title="原生连接" ```r title="Native Connection"
{{#include docs-examples/R/connect_native.r:demo}} {{#include docs-examples/R/connect_native.r:demo}}
``` ```
\ No newline at end of file
```rust title="原生连接/REST 连接" ```rust title="Native Connection/REST Connection"
{{#include docs-examples/rust/nativeexample/examples/connect.rs}} {{#include docs-examples/rust/nativeexample/examples/connect.rs}}
``` ```
:::note :::note
对于 Rust 连接器, 连接方式的不同只体现在使用的特性不同。如果启用了 "rest" 特性,那么只有 RESTful 的实现会被编译进来。 For Rust connector, the connection depends on the feature being used. If "rest" feature is enabled, then only the implementation for "rest" is compiled and packaged.
::: :::
--- ---
title: 建立连接 sidebar_label: Connect
description: "本节介绍如何使用连接器建立与 TDengine 的连接,给出连接器安装、连接的简单说明。" title: Connect to TDengine
description: "This document explains how to establish connection to TDengine, and briefly introduce how to install and use TDengine connectors."
--- ---
import Tabs from "@theme/Tabs"; import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem"; import TabItem from "@theme/TabItem";
import ConnJava from "./_connect_java.mdx"; import ConnJava from "./\_connect_java.mdx";
import ConnGo from "./_connect_go.mdx"; import ConnGo from "./\_connect_go.mdx";
import ConnRust from "./_connect_rust.mdx"; import ConnRust from "./\_connect_rust.mdx";
import ConnNode from "./_connect_node.mdx"; import ConnNode from "./\_connect_node.mdx";
import ConnPythonNative from "./_connect_python.mdx"; import ConnPythonNative from "./\_connect_python.mdx";
import ConnCSNative from "./_connect_cs.mdx"; import ConnCSNative from "./\_connect_cs.mdx";
import ConnC from "./_connect_c.mdx"; import ConnC from "./\_connect_c.mdx";
import ConnR from "./_connect_r.mdx"; import ConnR from "./\_connect_r.mdx";
import InstallOnWindows from "../../14-reference/03-connector/_linux_install.mdx"; import InstallOnWindows from "../../14-reference/03-connector/\_linux_install.mdx";
import InstallOnLinux from "../../14-reference/03-connector/_windows_install.mdx"; import InstallOnLinux from "../../14-reference/03-connector/\_windows_install.mdx";
import VerifyLinux from "../../14-reference/03-connector/_verify_linux.mdx"; import VerifyLinux from "../../14-reference/03-connector/\_verify_linux.mdx";
import VerifyWindows from "../../14-reference/03-connector/_verify_windows.mdx"; import VerifyWindows from "../../14-reference/03-connector/\_verify_windows.mdx";
TDengine 提供 REST API,容许在任何平台的任何应用程序通过它访问 TDengine 运行实例,详细介绍请看 [REST API](/reference/rest-api/)。除 REST API 之外,TDengine 还提供多种编程语言的连接器方便用户开发应用程序,其中包括 C/C++、Java、Python、Go、Node.js、C# 等。 本节介绍如何使用连接器建立与 TDengine 的连接,给出连接器安装、连接的简单说明。关于各连接器的详细功能说明,请查看[连接器](https://docs.taosdata.com/reference/connector/) Any application programs running on any kind of platforms can access TDengine through the REST API provided by TDengine. For the details please refer to [REST API](/reference/rest-api/). Besides, application programs can use the connectors of multiple languages to access TDengine, including C/C++, Java, Python, Go, Node.js, C#, and Rust. This chapter describes how to establish connection to TDengine and briefly introduce how to install and use connectors. For details about the connectors please refer to [Connectors](https://docs.taosdata.com/reference/connector/)
## 连接器建立连接的方式 ## Establish Connection
连接器建立连接的方式,TDengine 提供两种: There are two ways to establish connections to TDengine:
1. 通过 taosAdapter 组件提供的 REST API 建立与 taosd 的连接,这种连接方式下文中简称"REST 连接“ 1. Connection to taosd can be established through the REST API provided by taosAdapter component, this way is called "REST connection" hereinafter.
2. 通过客户端驱动程序 taosc 直接与服务端程序 taosd 建立连接,这种连接方式下文中简称“原生连接”。 2. Connection to taosd can be established through the client side driver taosc, this way is called "Native connection" hereinafter.
无论使用何种方式建立连接,连接器都提供了相同或相似的 API 操作数据库,都可以执行 SQL 语句,只是初始化连接的方式稍有不同,用户在使用上不会感到什么差别。 Either way, same or similar APIs are provided by connectors to access database or execute SQL statements, no obvious difference can be observed.
关键不同点在于 Key differences
1. 使用 REST 连接,用户无需安装客户端驱动程序 taosc,具有跨平台易用的优势,但性能要下降 30%左右。 1. With REST connection, it's not necessary to install the client side driver taosc, it's more friendly for cross-platform with the cost of 30% performance downgrade.
2. 使用原生连接可以体验 TDengine 的全部功能,如[参数绑定接口](/reference/connector/cpp#参数绑定-api)[订阅](reference/connector/cpp#数据订阅接口)等等。 2. With native connection, full compatibility of TDengine can be utilized, like [Parameter Binding](/reference/connector/cpp#Parameter Binding-api), [Subscription](reference/connector/cpp#Subscription), etc.
## 安装客户端驱动 taosc ## Install Client Driver taosc
如果选择原生连接,而且应用程序不在 TDengine 同一台服务器上运行,你需要先安装客户端驱动,否则可以跳过此一步。为避免客户端驱动和服务端不兼容,请使用一致的版本。 如果选择原生连接,而且应用程序不在 TDengine 同一台服务器上运行,你需要先安装客户端驱动,否则可以跳过此一步。为避免客户端驱动和服务端不兼容,请使用一致的版本。
### 安装步骤 ### Install
<Tabs defaultValue="linux" groupId="os"> <Tabs defaultValue="linux" groupId="os">
<TabItem value="linux" label="Linux"> <TabItem value="linux" label="Linux">
...@@ -49,9 +50,9 @@ TDengine 提供 REST API,容许在任何平台的任何应用程序通过它 ...@@ -49,9 +50,9 @@ TDengine 提供 REST API,容许在任何平台的任何应用程序通过它
</TabItem> </TabItem>
</Tabs> </Tabs>
### 安装验证 ### Verify
以上安装和配置完成后,并确认 TDengine 服务已经正常启动运行,此时可以执行安装包里带有的 TDengine 命令行程序 taos 进行登录。 After the above installation and configuration are done and making sure TDengine service is already started and in service, the Shell command `taos` can be launched to access TDengine.以
<Tabs defaultValue="linux" groupId="os"> <Tabs defaultValue="linux" groupId="os">
<TabItem value="linux" label="Linux"> <TabItem value="linux" label="Linux">
...@@ -62,12 +63,12 @@ TDengine 提供 REST API,容许在任何平台的任何应用程序通过它 ...@@ -62,12 +63,12 @@ TDengine 提供 REST API,容许在任何平台的任何应用程序通过它
</TabItem> </TabItem>
</Tabs> </Tabs>
## 安装连接器 ## Install Connectors
<Tabs groupId="lang"> <Tabs groupId="lang">
<TabItem label="Java" value="java"> <TabItem label="Java" value="java">
如果使用 maven 管理项目,只需在 pom.xml 中加入以下依赖。 If `maven` is used to manage the projects, what needs to be done is only adding below dependency in `pom.xml`.
```xml ```xml
<dependency> <dependency>
...@@ -80,13 +81,13 @@ TDengine 提供 REST API,容许在任何平台的任何应用程序通过它 ...@@ -80,13 +81,13 @@ TDengine 提供 REST API,容许在任何平台的任何应用程序通过它
</TabItem> </TabItem>
<TabItem label="Python" value="python"> <TabItem label="Python" value="python">
使用 `pip` 从 PyPI 安装: Install from PyPI using `pip`:
``` ```
pip install taospy pip install taospy
``` ```
从 Git URL 安装: Install from Git URL:
``` ```
pip install git+https://github.com/taosdata/taos-connector-python.git pip install git+https://github.com/taosdata/taos-connector-python.git
...@@ -95,7 +96,7 @@ pip install git+https://github.com/taosdata/taos-connector-python.git ...@@ -95,7 +96,7 @@ pip install git+https://github.com/taosdata/taos-connector-python.git
</TabItem> </TabItem>
<TabItem label="Go" value="go"> <TabItem label="Go" value="go">
编辑 `go.mod` 添加 `driver-go` 依赖即可。 Just need to add `driver-go` dependency in `go.mod` .
```go-mod title=go.mod ```go-mod title=go.mod
module goexample module goexample
...@@ -106,14 +107,14 @@ require github.com/taosdata/driver-go/v2 develop ...@@ -106,14 +107,14 @@ require github.com/taosdata/driver-go/v2 develop
``` ```
:::note :::note
driver-go 使用 cgo 封装了 taosc 的 API。cgo 需要使用 gcc 编译 C 的源码。因此需要确保你的系统上有 gcc。 `driver-go` uses `cgo` to wrap the APIs provided by taosc, while `cgo` needs `gcc` to compile source code in C language, so please make sure you have proper `gcc` on your system.
::: :::
</TabItem> </TabItem>
<TabItem label="Rust" value="rust"> <TabItem label="Rust" value="rust">
编辑 `Cargo.toml` 添加 `libtaos` 依赖即可。 Just need to add `libtaos` dependency in `Cargo.toml`.
```toml title=Cargo.toml ```toml title=Cargo.toml
[dependencies] [dependencies]
...@@ -121,7 +122,7 @@ libtaos = { version = "0.4.2"} ...@@ -121,7 +122,7 @@ libtaos = { version = "0.4.2"}
``` ```
:::info :::info
Rust 连接器通过不同的特性区分不同的连接方式。如果要建立 REST 连接,需要开启 `rest` 特性: Rust connector uses different features to distinguish the way to establish connection. To establish REST connection, please enable `rest` feature.
```toml ```toml
libtaos = { version = "*", features = ["rest"] } libtaos = { version = "*", features = ["rest"] }
...@@ -132,28 +133,28 @@ libtaos = { version = "*", features = ["rest"] } ...@@ -132,28 +133,28 @@ libtaos = { version = "*", features = ["rest"] }
</TabItem> </TabItem>
<TabItem label="Node.js" value="node"> <TabItem label="Node.js" value="node">
Node.js 连接器通过不同的包提供不同的连接方式。 Node.js connector provides different ways of establishing connections by providing different packages.
1. 安装 Node.js 原生连接器 1. Install Node.js Native Connector
``` ```
npm i td2.0-connector npm i td2.0-connector
``` ```
:::note :::note
推荐 Node 版本大于等于 `node-v12.8.0` 小于 `node-v13.0.0` It's recommend to use Node whose version is between `node-v12.8.0` and `node-v13.0.0`.
::: :::
2. 安装 Node.js REST 连接器
``` 2. Install Node.js REST Connector
npm i td2.0-rest-connector
``` ```
npm i td2.0-rest-connector
```
</TabItem> </TabItem>
<TabItem label="C#" value="csharp"> <TabItem label="C#" value="csharp">
编辑项目配置文件中添加 [TDengine.Connector](https://www.nuget.org/packages/TDengine.Connector/) 的引用即可: Just need to add the reference to [TDengine.Connector](https://www.nuget.org/packages/TDengine.Connector/) in the project configuration file.
```xml title=csharp.csproj {12} ```xml title=csharp.csproj {12}
<Project Sdk="Microsoft.NET.Sdk"> <Project Sdk="Microsoft.NET.Sdk">
...@@ -173,22 +174,22 @@ Node.js 连接器通过不同的包提供不同的连接方式。 ...@@ -173,22 +174,22 @@ Node.js 连接器通过不同的包提供不同的连接方式。
</Project> </Project>
``` ```
也可通过 dotnet 命令添加: Or add by `dotnet` command.
``` ```
dotnet add package TDengine.Connector dotnet add package TDengine.Connector
``` ```
:::note :::note
以下示例代码,均基于 dotnet6.0,如果使用其它版本,可能需要做适当调整。 The sample code below are based on dotnet6.0, they may need to be adjusted if your dotnet version is not exactly same.
::: :::
</TabItem> </TabItem>
<TabItem label="R" value="r"> <TabItem label="R" value="r">
1. 下载 [taos-jdbcdriver-version-dist.jar](https://repo1.maven.org/maven2/com/taosdata/jdbc/taos-jdbcdriver/2.0.38/) 1. Download [taos-jdbcdriver-version-dist.jar](https://repo1.maven.org/maven2/com/taosdata/jdbc/taos-jdbcdriver/2.0.38/).
2. 安装 R 的依赖包`RJDBC` 2. Install the dependency package `RJDBC`
```R ```R
install.packages("RJDBC") install.packages("RJDBC")
...@@ -197,15 +198,15 @@ install.packages("RJDBC") ...@@ -197,15 +198,15 @@ install.packages("RJDBC")
</TabItem> </TabItem>
<TabItem label="C" value="c"> <TabItem label="C" value="c">
如果已经安装了 TDengine 服务端软件或 TDengine 客户端驱动 taosc, 那么已经安装了 C 连接器,无需额外操作。 If the client driver taosc is already installed, then the C connector is already available.
<br/> <br/>
</TabItem> </TabItem>
</Tabs> </Tabs>
## 建立连接 ## Establish Connection
在执行这一步之前,请确保有一个正在运行的,且可以访问到的 TDengine,而且服务端的 FQDN 配置正确。以下示例代码,都假设 TDengine 安装在本机,且 FQDN(默认 localhost) 和 serverPort(默认 6030) 都使用默认配置。 Prior to establishing connection, please make sure TDengine is already running and accessible. The following sample code assumes TDengine is running on the same host as the client program, with FQDN configured to "localhost" and serverPort configured to "6030".
<Tabs groupId="lang" defaultValue="java"> <Tabs groupId="lang" defaultValue="java">
<TabItem label="Java" value="java"> <TabItem label="Java" value="java">
...@@ -235,6 +236,6 @@ install.packages("RJDBC") ...@@ -235,6 +236,6 @@ install.packages("RJDBC")
</Tabs> </Tabs>
:::tip :::tip
如果建立连接失败,大部分情况下是 FQDN 或防火墙的配置不正确,详细的排查方法请看[《常见问题及反馈》](https://docs.taosdata.com/train-faq/faq)中的“遇到错误 Unable to establish connection, 我怎么办?” If the connection fails, in most cases it's caused by improper configuration for FQDN or firewall. Please refer to the section "Unable to establish connection" in [FAQ](https://docs.taosdata.com/train-faq/faq).
::: :::
--- ---
sidebar_label: Data Model
slug: /model slug: /model
title: TDengine 数据建模 title: Data Model
--- ---
TDengine 采用类关系型数据模型,需要建库、建表。因此对于一个具体的应用场景,需要考虑库、超级表和普通表的设计。本节不讨论细致的语法规则,只介绍概念。 The data model employed by TDengine is similar to relational, users need to create database and tables. For a specific use case, the design of databases, stables (abbreviated for super table), and tables need to be considered. This chapter will explain the concept without syntax details.
关于数据建模请参考[视频教程](https://www.taosdata.com/blog/2020/11/11/1945.html)。 There is an [video training course](https://www.taosdata.com/blog/2020/11/11/1945.html) that can be referred to for learning purpose.
## 创建库 ## Create Database
不同类型的数据采集点往往具有不同的数据特征,包括数据采集频率的高低,数据保留时间的长短,副本的数目,数据块的大小,是否允许更新数据等等。为了在各种场景下 TDengine 都能最大效率的工作,TDengine 建议将不同数据特征的表创建在不同的库里,因为每个库可以配置不同的存储策略。创建一个库时,除 SQL 标准的选项外,还可以指定保留时长、副本数、内存块个数、时间精度、文件块里最大最小记录条数、是否压缩、一个数据文件覆盖的天数等多种参数。比如: The characteristics of data from different data collecting points may be different, such as collection frequency, days to keep, number of replicas, data block size, whether it's allowed to update data, etc. For TDengine to operate with best performance, it's strongly suggested to put the data with different characteristics into different databases because different storage policy can be set for each database. When creating a database, there are a lot of parameters that can be configured, such as the days to keep data, the number of replicas, the number of memory blocks, time precision, the minimum and maximum number of rows in each data block, compress or not, the time range of the data in single data file, etc. Below is an example of the SQL statement for creating a database.
```sql ```sql
CREATE DATABASE power KEEP 365 DAYS 10 BLOCKS 6 UPDATE 1; CREATE DATABASE power KEEP 365 DAYS 10 BLOCKS 6 UPDATE 1;
``` ```
上述语句将创建一个名为 power 的库,这个库的数据将保留 365 天(超过 365 天将被自动删除),每 10 天一个数据文件,内存块数为 6,允许更新数据。详细的语法及参数请见 [数据库管理](/taos-sql/database) 章节。 In the above SQL statement, a database named "power" will be created, the data in it will be kept for 365 days, which means the data older than 365 days will be deleted automatically, a new data file will be created every 10 days, the number of memory blocks is 6, data is allowed to be updated. For more details please refer to [Database](/taos-sql/database).
创建库之后,需要使用 SQL 命令 `USE` 将当前库切换过来,例如: After creating a database, the current database in use can be switched using SQL command `USE`, for example below SQL statement switches the current database to `power`. Without current database specified, table name must be preceded with the corresponding database name.
```sql ```sql
USE power; USE power;
``` ```
将当前连接里操作的库换为 power,否则对具体表操作前,需要使用“库名.表名”来指定库的名字。
:::note :::note
- 任何一张表或超级表必须属于某个库,在创建表之前,必须先创建库。 - Any table or stable must belong to a database. To create a table or stable, the database it belongs to must be ready.
- 处于两个不同库的表是不能进行 JOIN 操作的。 - JOIN operation can't be performed tables from two different databases.
- 创建并插入记录、查询历史记录的时候,均需要指定时间戳。 - Timestamp needs to be specified when inserting rows or querying historical rows.
::: :::
## 创建超级表 ## Create STable
一个物联网系统,往往存在多种类型的设备,比如对于电网,存在智能电表、变压器、母线、开关等等。为便于多表之间的聚合,使用 TDengine, 需要对每个类型的数据采集点创建一个超级表。以[表 1](/tdinternal/arch#model_table1) 中的智能电表为例,可以使用如下的 SQL 命令创建超级表: In a typical IoT system, there may be a lot of kinds of devices. For example, in the electrical power system there are meters, transformers, bus bars, switches, etc. For easy aggregate of multiple tables, one STable needs to be created for each kind of devices. For example, for the meters in [table 1](/tdinternal/arch#model_table1), below SQL statement can be used to create the super table.
```sql ```sql
CREATE STABLE meters (ts timestamp, current float, voltage int, phase float) TAGS (location binary(64), groupId int); CREATE STABLE meters (ts timestamp, current float, voltage int, phase float) TAGS (location binary(64), groupId int);
``` ```
:::note :::note
这一指令中的 STABLE 关键字,在 2.0.15 之前的版本中需写作 TABLE 。 If you are using versions prior to 2.0.15, the `STABLE` keyword needs to be replaced with `TABLE`.
::: :::
与创建普通表一样,创建超级表时,需要提供表名(示例中为 meters),表结构 Schema,即数据列的定义。第一列必须为时间戳(示例中为 ts),其他列为采集的物理量(示例中为 current, voltage, phase),数据类型可以为整型、浮点型、字符串等。除此之外,还需要提供标签的 schema (示例中为 location, groupId),标签的数据类型可以为整型、浮点型、字符串等。采集点的静态属性往往可以作为标签,比如采集点的地理位置、设备型号、设备组 ID、管理员 ID 等等。标签的 schema 可以事后增加、删除、修改。具体定义以及细节请见 [TAOS SQL 的超级表管理](/taos-sql/stable) 章节。 Similar to creating a normal table, when creating a stable, name and schema need to be provided too. In the stable schema, the first column must be timestamp (like ts in the example), and other columns (like current, voltage and phase in the example) are the data collected. The type of a column can be integer, floating point number, string ,etc. Besides, the schema for tags need t obe provided, like location and groupId in the example. The type of a tag can be integer, floating point number, string, etc. The static properties of a data collection point can be defined as tags, like the location, device type, device group ID, manager ID, etc. Tags in the schema can be added, removed or altered. Please refer to [STable](/taos-sql/stable) for more details.
每一种类型的数据采集点需要建立一个超级表,因此一个物联网系统,往往会有多个超级表。对于电网,我们就需要对智能电表、变压器、母线、开关等都建立一个超级表。在物联网中,一个设备就可能有多个数据采集点(比如一台风力发电的风机,有的采集点采集电流、电压等电参数,有的采集点采集温度、湿度、风向等环境参数),这个时候,对这一类型的设备,需要建立多张超级表。 Each kind of data collecting points needs a corresponding stable to be created, so there may be many stables in an IoT system. For electrical power system, we need to create a stable respectively for meters, transformers, bug bars, switches. There may be multiple kinds of data collecting points on a single device, for example there may be one data collecting point for electrical data like current and voltage and another point for environmental data like temperature, humidity and wind direction, multiple stables are required for such kind of device.
一张超级表最多容许 4096 列 (在 2.1.7.0 版本之前,列数限制为 1024 列),如果一个采集点采集的物理量个数超过 4096,需要建多张超级表来处理。一个系统可以有多个 DB,一个 DB 里可以有一到多个超级表。 At most 4096 (or 1024 prior to version 2.1.7.0) columns are allowed in a stable. If there are more than 4096 of physical variables to bo collected for a single collecting point, multiple stables are required for such kind of data collecting point. There can be multiple databases in system, while one or more stables can exist in a database.
## 创建表 ## Create Table
TDengine 对每个数据采集点需要独立建表。与标准的关系型数据库一样,一张表有表名,Schema,但除此之外,还可以带有一到多个标签。创建时,需要使用超级表做模板,同时指定标签的具体值。以[表 1](/tdinternal/arch#model_table1)中的智能电表为例,可以使用如下的 SQL 命令建表: A specific table needs to be created for each data collecting point. Similar to RDBMS, table name and schema are required to create a table. Beside, one or more tags can be created for each table. To create a table, a stable needs to be used as template and the values need to be specified for the tags. For example, for the meters in [Table 1](/tdinternal/arch#model_table1), the table can be created using below SQL statement.
```sql ```sql
CREATE TABLE d1001 USING meters TAGS ("Beijing.Chaoyang", 2); CREATE TABLE d1001 USING meters TAGS ("Beijing.Chaoyang", 2);
``` ```
其中 d1001 是表名,meters 是超级表的表名,后面紧跟标签 Location 的具体标签值 ”Beijing.Chaoyang",标签 groupId 的具体标签值 2。虽然在创建表时,需要指定标签值,但可以事后修改。详细细则请见 [TAOS SQL 的表管理](/taos-sql/table) 章节。 In the above SQL statement, "d1001" is the table name, "meters" is the stable name, followed by the value of tag "Location" and the value of tag "groupId", which are "Beijing.Chaoyang" and "2" respectively in the example. The tag values can be altered after the table is created. Please refer to [Tables](/taos-sql/table) for details.
:::warning :::warning
目前 TDengine 没有从技术层面限制使用一个 database (db1) 的超级表作为模板建立另一个 database (db2) 的子表,后续会禁止这种用法,不建议使用这种方法建表。 It's not recommended to create a table in a database while using a stable from another database as template.
:::
TDengine 建议将数据采集点的全局唯一 ID 作为表名(比如设备序列号)。但对于有的场景,并没有唯一的 ID,可以将多个 ID 组合成一个唯一的 ID。不建议将具有唯一性的 ID 作为标签值。 :::tip
It's suggested to use the global unique ID of a data collecting point as the table name, for example the device serial number. If there isn't such a unique ID, multiple IDs that are not global unique can be combined to form a global unique ID. It's not recommended to use a global unique ID as tag value.
### 自动建表 ### Create Table Automatically
在某些特殊场景中,用户在写数据时并不确定某个数据采集点的表是否存在,此时可在写入数据时使用自动建表语法来创建不存在的表,若该表已存在则不会建立新表且后面的 USING 语句被忽略。比如: In some circumstances, it's not sure whether the table already exists when inserting rows. The table can be created automatically using the SQL statement below, and nothing will happen if the table already exist.
```sql ```sql
INSERT INTO d1001 USING meters TAGS ("Beijng.Chaoyang", 2) VALUES (now, 10.2, 219, 0.32); INSERT INTO d1001 USING meters TAGS ("Beijng.Chaoyang", 2) VALUES (now, 10.2, 219, 0.32);
``` ```
上述 SQL 语句将记录`(now, 10.2, 219, 0.32)`插入表 d1001。如果表 d1001 还未创建,则使用超级表 meters 做模板自动创建,同时打上标签值 `"Beijing.Chaoyang", 2`。 In the above SQL statement, a row with value `(now, 10.2, 219, 0.32)` will be inserted into table "d1001". If table "d1001" doesn't exist, it will be created automatically using stable "meters" as template with tag value `"Beijing.Chaoyang", 2`.
关于自动建表的详细语法请参见 [插入记录时自动建表](/taos-sql/insert#插入记录时自动建表) 章节。 For more details please refer to [Create Table Automatically](/taos-sql/insert#automatically-create-table-when-inserting).
## 多列模型 vs 单列模型 ## Single Column vs Multiple Column
TDengine 支持多列模型,只要物理量是一个数据采集点同时采集的(时间戳一致),这些量就可以作为不同列放在一张超级表里。但还有一种极限的设计,单列模型,每个采集的物理量都单独建表,因此每种类型的物理量都单独建立一超级表。比如电流、电压、相位,就建三张超级表。 Multiple columns data model is supported in TDengine. As long as multiple physical variables are collected by same data collecting point at same time, i.e. the timestamp are identical, these variables can be put in single stable as columns. However, there is another kind of design, i.e. single column data model, a table is created for each physical variable, which means a stable is required for each kind of physical variables. For example, 3 stables are required for current, voltage and phase.
TDengine 建议尽可能采用多列模型,因为插入效率以及存储效率更高。但对于有些场景,一个采集点的采集量的种类经常变化,这个时候,如果采用多列模型,就需要频繁修改超级表的结构定义,让应用变的复杂,这个时候,采用单列模型会显得更简单。 It's recommended to use multiple column data model as possible because it's better in the speed of inserting or querying rows. In some cases, however, the physical variables to be collected vary frequently and correspondingly the stable schema needs to be changed frequently too. In such case, it's more convenient to use single column data model.
--- ---
title: SQL 写入 sidebar_label: SQL
title: Insert Using SQL
--- ---
import Tabs from "@theme/Tabs"; import Tabs from "@theme/Tabs";
...@@ -19,52 +20,53 @@ import CsStmt from "./_cs_stmt.mdx"; ...@@ -19,52 +20,53 @@ import CsStmt from "./_cs_stmt.mdx";
import CSQL from "./_c_sql.mdx"; import CSQL from "./_c_sql.mdx";
import CStmt from "./_c_stmt.mdx"; import CStmt from "./_c_stmt.mdx";
## SQL 写入简介 ## Introduction
应用通过连接器执行 INSERT 语句来插入数据,用户还可以通过 TAOS Shell,手动输入 INSERT 语句插入数据。 Application program can execute `INSERT` statement through connectors to insert rows. TAOS Shell can be launched manually to insert data too.
### 一次写入一条 ### Insert Single Row
下面这条 INSERT 就将一条记录写入到表 d1001 中:
Below SQL statement is used to insert one row into table "d1001".
```sql ```sql
INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31); INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31);
``` ```
### 一次写入多条 ### Insert Multiple Rows
TDengine 支持一次写入多条记录,比如下面这条命令就将两条记录写入到表 d1001 中: Multiple rows can be inserted in single SQL statement. Below example inserts 2 rows into table "d1001".
```sql ```sql
INSERT INTO d1001 VALUES (1538548684000, 10.2, 220, 0.23) (1538548696650, 10.3, 218, 0.25); INSERT INTO d1001 VALUES (1538548684000, 10.2, 220, 0.23) (1538548696650, 10.3, 218, 0.25);
``` ```
### 一次写入多表 ### Insert into Multiple Tables
TDengine 也支持一次向多个表写入数据,比如下面这条命令就向 d1001 写入两条记录,向 d1002 写入一条记录: Data can be inserted into multiple tables in same SQL statement. Below example inserts 2 rows into table "d1001" and 1 row into table "d1002".
```sql ```sql
INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31) (1538548695000, 12.6, 218, 0.33) d1002 VALUES (1538548696800, 12.3, 221, 0.31); INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31) (1538548695000, 12.6, 218, 0.33) d1002 VALUES (1538548696800, 12.3, 221, 0.31);
``` ```
详细的 SQL INSERT 语法规则参考 [TAOS SQL 的数据写入](/taos-sql/insert)。 For more details about `INSERT` please refer to [INSERT](/taos-sql/insert).
:::info :::info
- 要提高写入效率,需要批量写入。一批写入的记录条数越多,插入效率就越高。但一条记录不能超过 16K,一条 SQL 语句总长度不能超过 1M 。 - Inserting in batch can gain better performance. Normally, the higher the batch size, the better the performance. Please be noted each single row can't exceed 16K bytes and each single SQL statement can't exceed 1M bytes.
- TDengine 支持多线程同时写入,要进一步提高写入速度,一个客户端需要打开 20 个以上的线程同时写。但线程数达到一定数量后,无法再提高,甚至还会下降,因为线程频繁切换,带来额外开销。 - Inserting with multiple threads can gain better performance too. However, depending on the system resources on the client side and the server side, with the number of inserting threads grows to a specific point, the performance may drop instead of growing. The proper number of threads need to be tested in a specific environment to find the best number.
::: :::
:::warning :::warning
- 对同一张表,如果新插入记录的时间戳已经存在,默认情形下(UPDATE=0)新记录将被直接抛弃,也就是说,在一张表里,时间戳必须是唯一的。如果应用自动生成记录,很有可能生成的时间戳是一样的,这样,成功插入的记录条数会小于应用插入的记录条数。如果在创建数据库时使用了 UPDATE 1 选项,插入相同时间戳的新记录将覆盖原有记录。 - If the timestamp for the row to be inserted already exists in the table, the behavior depends on the value of parameter `UPDATE`. If it's set to 0 (also the default value), the row will be discarded. If it's set to 1, the new values will override the old values for the same row.
- 写入的数据的时间戳必须大于当前时间减去配置参数 keep 的时间。如果 keep 配置为 3650 天,那么无法写入比 3650 天还早的数据。写入数据的时间戳也不能大于当前时间加配置参数 days。如果 days 为 2,那么无法写入比当前时间还晚 2 天的数据。 - The timestamp to be inserted must be newer than the timestamp of subtracting current time by the parameter `KEEP`. If `KEEP` is set to 3650 days, then the data older than 3650 days ago can't be inserted. The timestamp to be inserted can't be newer than the timestamp of current time plus parameter `DAYS`. If `DAYS` is set to 2, the data newer than 2 days later can't be inserted.
::: :::
## 示例程序 ## Examples
### 普通 SQL 写入 ### Insert Using SQL
<Tabs defaultValue="java" groupId="lang"> <Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java"> <TabItem label="Java" value="java">
...@@ -92,16 +94,16 @@ INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31) (1538548695000, 12.6, ...@@ -92,16 +94,16 @@ INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31) (1538548695000, 12.6,
:::note :::note
1. 无论 RESTful 方式建立连接还是本地驱动方式建立连接,以上示例代码都能正常工作。 1. With either native connection or REST connection, the above samples can work well.
2. 唯一需要注意的是:由于 RESTful 接口无状态, 不能使用 `use db` 语句来切换数据库, 所以在上面示例中使用了`dbName.tbName`指定表名。 2. Please be noted that `use db` can't be used with REST connection because REST connection is stateless, so in the samples `dbName.tbName` is used to specify the table name.
::: :::
### 参数绑定写入 ### Insert with Parameter Binding
TDengine 也提供了支持参数绑定的 Prepare API,与 MySQL 类似,这些 API 目前也仅支持用问号 `?` 来代表待绑定的参数。从 2.1.1.0 和 2.1.2.0 版本开始,TDengine 大幅改进了参数绑定接口对数据写入(INSERT)场景的支持。这样在通过参数绑定接口写入数据时,就避免了 SQL 语法解析的资源消耗,从而在绝大多数情况下显著提升写入性能。 TDengine also provides Prepare API that support parameter binding. Similar to MySQL, only `?` can be used in these APIs to represent the parameters to bind. From version 2.1.1.0 and 2.1.2.0, parameter binding support for inserting data has been improved significantly to improve the insert performance by avoiding the cost of parsing SQL statements.
需要注意的是,只有使用原生连接的连接器,才能使用参数绑定功能。 Parameter binding is available only with native connection.
<Tabs defaultValue="java" groupId="lang"> <Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java"> <TabItem label="Java" value="java">
...@@ -126,4 +128,3 @@ TDengine 也提供了支持参数绑定的 Prepare API,与 MySQL 类似,这 ...@@ -126,4 +128,3 @@ TDengine 也提供了支持参数绑定的 Prepare API,与 MySQL 类似,这
<CStmt /> <CStmt />
</TabItem> </TabItem>
</Tabs> </Tabs>
--- ---
sidebar_label: InfluxDB 行协议 sidebar_label: InfluxDB Line Protocol
title: InfluxDB 行协议 title: InfluxDB Line Protocol
--- ---
import Tabs from "@theme/Tabs"; import Tabs from "@theme/Tabs";
...@@ -13,20 +13,20 @@ import NodeLine from "./_js_line.mdx"; ...@@ -13,20 +13,20 @@ import NodeLine from "./_js_line.mdx";
import CsLine from "./_cs_line.mdx"; import CsLine from "./_cs_line.mdx";
import CLine from "./_c_line.mdx"; import CLine from "./_c_line.mdx";
## 协议介绍 ## Introduction
InfluxDB Line 协议采用一行字符串来表示一行数据。分为四部分: A single line of text is used in InfluxDB Line protocol format represents one row of data, each line contains 4 parts as shown below.
``` ```
measurement,tag_set field_set timestamp measurement,tag_set field_set timestamp
``` ```
- measurement 将作为超级表名。它与 tag_set 之间使用一个英文逗号来分隔。 - `measurement` will be used as the stable name
- tag_set 将作为标签数据,其格式形如 `<tag_key>=<tag_value>,<tag_key>=<tag_value>`,也即可以使用英文逗号来分隔多个标签数据。它与 field_set 之间使用一个半角空格来分隔。 - `tag_set` will be used as tags, with format like `<tag_key>=<tag_value>,<tag_key>=<tag_value>`
- field_set 将作为普通列数据,其格式形如 `<field_key>=<field_value>,<field_key>=<field_value>`,同样是使用英文逗号来分隔多个普通列的数据。它与 timestamp 之间使用一个半角空格来分隔。 - `field_set`will be used as data columns, with format like `<field_key>=<field_value>,<field_key>=<field_value>`
- timestamp 即本行数据对应的主键时间戳。 - `timestamp` is the primary key timestamp corresponding to this row of data
例如: For example:
``` ```
meters,location=Beijing.Haidian,groupid=2 current=13.4,voltage=223,phase=0.29 1648432611249500 meters,location=Beijing.Haidian,groupid=2 current=13.4,voltage=223,phase=0.29 1648432611249500
...@@ -34,16 +34,16 @@ meters,location=Beijing.Haidian,groupid=2 current=13.4,voltage=223,phase=0.29 16 ...@@ -34,16 +34,16 @@ meters,location=Beijing.Haidian,groupid=2 current=13.4,voltage=223,phase=0.29 16
:::note :::note
- tag_set 中的所有的数据自动转化为 nchar 数据类型; - All the data in `tag_set` will be converted to ncahr type automatically .
- field_set 中的每个数据项都需要对自身的数据类型进行描述, 比如 1.2f32 代表 float 类型的数值 1.2, 如果不带类型后缀会被当作 double 处理; - Each data in `field_set` must be self-description for its data type. For example 1.2f32 means a value 1.2 of float type, it will be treated as double without the "f" type suffix.
- timestamp 支持多种时间精度。写入数据的时候需要用参数指定时间精度,支持从小时到纳秒的 6 种时间精度。 - Multiple kinds of precision can be used for the `timestamp` field. Time precision can be from nanosecond (ns) to hour (h).
::: :::
要了解更多可参考:[InfluxDB Line 协议官方文档](https://docs.influxdata.com/influxdb/v2.0/reference/syntax/line-protocol/) 和 [TDengine 无模式写入参考指南](/reference/schemaless/#无模式写入行协议) For more details please refer to [InfluxDB Line Protocol](https://docs.influxdata.com/influxdb/v2.0/reference/syntax/line-protocol/) and [TDengine Schemaless](/reference/schemaless/#Schemaless-Line-Protocol)
## 示例代码 ## Examples
<Tabs defaultValue="java" groupId="lang"> <Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java"> <TabItem label="Java" value="java">
......
--- ---
sidebar_label: OpenTSDB 行协议 sidebar_label: OpenTSDB Line Protocol
title: OpenTSDB 行协议 title: OpenTSDB Line Protocol
--- ---
import Tabs from "@theme/Tabs"; import Tabs from "@theme/Tabs";
...@@ -13,28 +13,28 @@ import NodeTelnet from "./_js_opts_telnet.mdx"; ...@@ -13,28 +13,28 @@ import NodeTelnet from "./_js_opts_telnet.mdx";
import CsTelnet from "./_cs_opts_telnet.mdx"; import CsTelnet from "./_cs_opts_telnet.mdx";
import CTelnet from "./_c_opts_telnet.mdx"; import CTelnet from "./_c_opts_telnet.mdx";
## 协议介绍 ## Introduction
OpenTSDB 行协议同样采用一行字符串来表示一行数据。OpenTSDB 采用的是单列模型,因此一行只能包含一个普通数据列。标签列依然可以有多个。分为四部分,具体格式约定如下: A single line of text is used in OpenTSDB line protocol to represent one row of data. OpenTSDB employs single column data model, so one line can only contains single data column. There can be multiple tags. Each line contains 4 parts as below:
```txt ```
<metric> <timestamp> <value> <tagk_1>=<tagv_1>[ <tagk_n>=<tagv_n>] <metric> <timestamp> <value> <tagk_1>=<tagv_1>[ <tagk_n>=<tagv_n>]
``` ```
- metric 将作为超级表名。 - `metric` will be used as stable name.
- timestamp 本行数据对应的时间戳。根据时间戳的长度自动识别时间精度。支持秒和毫秒两种时间精度 - `timestamp` is the timestamp of current row of data. The time precision will be determined automatically based on the length of the timestamp. second and millisecond time precision are supported.\
- value 度量值,必须为一个数值。对应的列名也是 “value”。 - `value` is a physical variable which must be a numeric value, the corresponding column name is "value".
- 最后一部分是标签集, 用空格分隔不同标签, 所有标签自动转化为 nchar 数据类型; - The last part is tag sets separated by space, all tags will be converted to nchar type automatically.
例如: For example:
```txt ```txt
meters.current 1648432611250 11.3 location=Beijing.Haidian groupid=3 meters.current 1648432611250 11.3 location=Beijing.Haidian groupid=3
``` ```
参考[OpenTSDB Telnet API文档](http://opentsdb.net/docs/build/html/api_telnet/put.html)。 Please refer to [OpenTSDB Telnet API](http://opentsdb.net/docs/build/html/api_telnet/put.html) for more details.
## 示例代码 ## Examples
<Tabs defaultValue="java" groupId="lang"> <Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java"> <TabItem label="Java" value="java">
...@@ -60,7 +60,7 @@ meters.current 1648432611250 11.3 location=Beijing.Haidian groupid=3 ...@@ -60,7 +60,7 @@ meters.current 1648432611250 11.3 location=Beijing.Haidian groupid=3
</TabItem> </TabItem>
</Tabs> </Tabs>
以上示例代码会自动创建 2 个超级表, 每个超级表有 4 条数据。 2 stables will be crated automatically while each stable has 4 rows of data in the above sample code.
```cmd ```cmd
taos> use test; taos> use test;
......
--- ---
sidebar_label: OpenTSDB JSON 格式协议 sidebar_label: OpenTSDB JSON Protocol
title: OpenTSDB JSON 格式协议 title: OpenTSDB JSON Protocol
--- ---
import Tabs from "@theme/Tabs"; import Tabs from "@theme/Tabs";
...@@ -13,9 +13,9 @@ import NodeJson from "./_js_opts_json.mdx"; ...@@ -13,9 +13,9 @@ import NodeJson from "./_js_opts_json.mdx";
import CsJson from "./_cs_opts_json.mdx"; import CsJson from "./_cs_opts_json.mdx";
import CJson from "./_c_opts_json.mdx"; import CJson from "./_c_opts_json.mdx";
## 协议介绍 ## Introduction
OpenTSDB JSON 格式协议采用一个 JSON 字符串表示一行或多行数据。例如: A JSON string is sued in OpenTSDB JSON to represent one or more rows of data, for exmaple:
```json ```json
[ [
...@@ -40,18 +40,18 @@ OpenTSDB JSON 格式协议采用一个 JSON 字符串表示一行或多行数据 ...@@ -40,18 +40,18 @@ OpenTSDB JSON 格式协议采用一个 JSON 字符串表示一行或多行数据
] ]
``` ```
与 OpenTSDB 行协议类似, metric 将作为超级表名, timestamp 表示时间戳,value 表示度量值, tags 表示标签集。 Similar to OpenTSDB line protocol, `metric` will be used as the stable name, `timestamp` is the timestamp to be used, `value` represents the physical variable collected, `tags` are the tag sets.
参考[OpenTSDB HTTP API文档](http://opentsdb.net/docs/build/html/api_http/put.html)。 Please refer to [OpenTSDB HTTP API](http://opentsdb.net/docs/build/html/api_http/put.html) for more details.
:::note :::note
- 对于 JSON 格式协议,TDengine 并不会自动把所有标签转成 nchar 类型, 字符串将将转为 nchar 类型, 数值将同样转换为 double 类型。 - In JSON protocol, strings will be converted to nchar type and numeric values will be converted to double type.
- TDengine 只接收 JSON **数组格式**的字符串,即使一行数据也需要转换成数组形式。 - Only data in array format is accepted, array must be used even there is only one row.
::: :::
## 示例代码 ## Examples
<Tabs defaultValue="java" groupId="lang"> <Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java"> <TabItem label="Java" value="java">
...@@ -77,7 +77,7 @@ OpenTSDB JSON 格式协议采用一个 JSON 字符串表示一行或多行数据 ...@@ -77,7 +77,7 @@ OpenTSDB JSON 格式协议采用一个 JSON 字符串表示一行或多行数据
</TabItem> </TabItem>
</Tabs> </Tabs>
以上示例代码会自动创建 2 个超级表, 每个超级表有 2 条数据。 The above sample code will created 2 stables automatically while each stable has 2 rows of data.
```cmd ```cmd
taos> use test; taos> use test;
......
```c title=一次绑定一行 ```c title=Single Row Binding
{{#include docs-examples/c/stmt_example.c}} {{#include docs-examples/c/stmt_example.c}}
``` ```
```c title=一次绑定多行 72:117 ```c title=Multiple Row Binding 72:117
{{#include docs-examples/c/multi_bind_example.c}} {{#include docs-examples/c/multi_bind_example.c}}
``` ```
\ No newline at end of file
label: 写入数据 label: Insert
link: link:
type: generated-index type: generated-index
slug: /insert-data/ slug: /insert-data/
description: "TDengine 支持多种写入协议,包括 SQL,InfluxDB Line 协议, OpenTSDB Telnet 协议,OpenTSDB JSON 格式协议。数据可以单条插入,也可以批量插入,可以插入一个数据采集点的数据,也可以同时插入多个数据采集点的数据。同时,TDengine 支持多线程插入,支持时间乱序数据插入,也支持历史数据插入。InfluxDB Line 协议、OpenTSDB Telnet 协议和 OpenTSDB JSON 格式协议是 TDengine 支持的三种无模式写入协议。使用无模式方式写入无需提前创建超级表和子表,并且引擎能自适用数据对表结构做调整。" description: "TDengine supports multiple protocols of inserting data, including SQL, InfluxDB Line protocol, OpenTSDB Telnet protocol, OpenTSDB JSON protocol. Data can be inserted row by row, or in batch. Data from one or more collecting points can be inserted simultaneously. In the meantime, data can be inserted with multiple threads, out of order data and historical data can be inserted too. InfluxDB Line protocol, OpenTSDB Telnet protocol and OpenTSDB JSON protocol are the 3 kinds of schemaless insert protocols supported by TDengine. It's not necessary to create stable and table in advance if using schemaless protocols, and the schemas can be adjusted automatically according to the data to be inserted."
...@@ -3,6 +3,6 @@ ...@@ -3,6 +3,6 @@
``` ```
:::tip :::tip
driver-go 的模块 `github.com/taosdata/driver-go/v2/wrapper` 是 C 接口的底层封装。使用这个模块也可以实现参数绑定写入。 `github.com/taosdata/driver-go/v2/wrapper` module in driver-go is the wrapper for C API, it can be used to insert data with parameter binding.
::: :::
```js title=一次绑定一行 ```js title=Single Row Binding
{{#include docs-examples/node/nativeexample/param_bind_example.js}} {{#include docs-examples/node/nativeexample/param_bind_example.js}}
``` ```
```js title=一次绑定多行 ```js title=Multiple Row Binding
{{#include docs-examples/node/nativeexample/multi_bind_example.js:insertData}} {{#include docs-examples/node/nativeexample/multi_bind_example.js:insertData}}
``` ```
:::info :::info
一次绑定一行效率不如一次绑定多行,但支持非 INSERT 语句。一次绑定多行效率更高,但仅支持 INSERT 语句。 Multiple row binding is better in performance than single row binding, but it can only be used with `INSERT` statement while single row binding can be used for other SQL statements besides `INSERT`.
::: :::
```py title=一次绑定一行 ```py title=Single Row Binding
{{#include docs-examples/python/bind_param_example.py}} {{#include docs-examples/python/bind_param_example.py}}
``` ```
```py title=一次绑定多行 ```py title=Multiple Row Binding
{{#include docs-examples/python/multi_bind_example.py:bind_batch}} {{#include docs-examples/python/multi_bind_example.py:bind_batch}}
``` ```
:::info :::info
一次绑定一行效率不如一次绑定多行,但支持非 INSERT 语句。一次绑定多行效率更高,但仅支持 INSERT 语句。 Multiple row binding is better in performance than single row binding, but it can only be used with `INSERT` statement while single row binding can be used for other SQL statements besides `INSERT`.
::: :::
\ No newline at end of file
通过迭代逐行获取查询结果。 Result set is iterated row by row.
```py ```py
{{#include docs-examples/python/query_example.py:iter}} {{#include docs-examples/python/query_example.py:iter}}
``` ```
一次获取所有查询结果,并把每一行转化为一个字典返回。 Result set is retrieved as a whole, each row is converted to a dict and returned.
```py ```py
{{#include docs-examples/python/query_example.py:fetch_all}} {{#include docs-examples/python/query_example.py:fetch_all}}
``` ```
\ No newline at end of file
...@@ -3,6 +3,6 @@ ...@@ -3,6 +3,6 @@
``` ```
:::note :::note
这个示例程序,目前在 Windows 系统上还无法运行 This sample code can't be run on Windows system for now.
::: :::
--- ---
slug: /query-data slug: /query-data
title: 查询数据 Sidebar_label: Select
description: "主要查询功能,通过连接器执行同步查询和异步查询" title: Select
description: "This chapter introduces major query functionalities and how to perform sync and async query using connectors."
--- ---
import Tabs from "@theme/Tabs"; import Tabs from "@theme/Tabs";
...@@ -18,20 +19,26 @@ import NodeAsync from "./_js_async.mdx"; ...@@ -18,20 +19,26 @@ import NodeAsync from "./_js_async.mdx";
import CsAsync from "./_cs_async.mdx"; import CsAsync from "./_cs_async.mdx";
import CAsync from "./_c_async.mdx"; import CAsync from "./_c_async.mdx";
## 主要查询功能 ## Introduction
TDengine 采用 SQL 作为查询语言。应用程序可以通过 REST API 或连接器发送 SQL 语句,用户还可以通过 TDengine 命令行工具 taos 手动执行 SQL 即席查询(Ad-Hoc Query)。TDengine 支持如下查询功能 SQL is used by TDengine as the language for query. Application programs can send SQL statements to TDengine through REST API or connectors. TDengine CLI `taos` can also be used to execute SQL Ad-Hoc query. Here is the list of major query functionalities supported by TDengine
- 单列、多列数据查询 - Query on single column or multiple columns
- 标签和数值的多种过滤条件:>, <, =, <\>, like 等 - Filter on tags or data columns:>, <, =, <\>, like
- 聚合结果的分组(Group by)、排序(Order by)、约束输出(Limit/Offset) - Grouping of results: `Group By`
- 数值列及聚合结果的四则运算 - Sorting of results: `Order By`
- 时间戳对齐的连接查询(Join Query: 隐式连接)操作 - Limit the number of results: `Limit/Offset`
- 多种聚合/计算函数: count, max, min, avg, sum, twa, stddev, leastsquares, top, bottom, first, last, percentile, apercentile, last_row, spread, diff 等 - Arithmetic on columns of numeric types or aggregate results
- Join query with timestamp alignment
- Aggregate functions: count, max, min, avg, sum, twa, stddev, leastsquares, top, bottom, first, last, percentile, apercentile, last_row, spread, diff
例如:在命令行工具 taos 中,从表 d1001 中查询出 voltage > 215 的记录,按时间降序排列,仅仅输出 2 条。 For example, below SQL statement can be executed in TDengine CLI `taos` to select the rows whose voltage column is bigger than 215 and limit the output to only 2 rows.
```sql ```sql
select * from d1001 where voltage > 215 order by ts desc limit 2;
```
```title=Output
taos> select * from d1001 where voltage > 215 order by ts desc limit 2; taos> select * from d1001 where voltage > 215 order by ts desc limit 2;
ts | current | voltage | phase | ts | current | voltage | phase |
====================================================================================== ======================================================================================
...@@ -40,17 +47,17 @@ taos> select * from d1001 where voltage > 215 order by ts desc limit 2; ...@@ -40,17 +47,17 @@ taos> select * from d1001 where voltage > 215 order by ts desc limit 2;
Query OK, 2 row(s) in set (0.001100s) Query OK, 2 row(s) in set (0.001100s)
``` ```
为满足物联网场景的需求,TDengine 支持几个特殊的函数,比如 twa(时间加权平均),spread (最大值与最小值的差),last_row(最后一条记录)等,更多与物联网场景相关的函数将添加进来。TDengine 还支持连续查询。 To meet the requirements in IoT use cases, some special functions have been added in TDengine, for example `twa` (Time Weighted Average), `spared` (The difference between the maximum and the minimum), `last_row` (the last row), more and more functions will be added to better perform in IoT use cases. Furthermore, continuous query is also supported in TDengine.
具体的查询语法请看 [TAOS SQL 的数据查询](/taos-sql/select) 章节。 For detailed query syntax please refer to [Select](/taos-sql/select).
## 多表聚合查询 ## Join Query
物联网场景中,往往同一个类型的数据采集点有多个。TDengine 采用超级表(STable)的概念来描述某一个类型的数据采集点,一张普通的表来描述一个具体的数据采集点。同时 TDengine 使用标签来描述数据采集点的静态属性,一个具体的数据采集点有具体的标签值。通过指定标签的过滤条件,TDengine 提供了一高效的方法将超级表(某一类型的数据采集点)所属的子表进行聚合查询。对普通表的聚合函数以及绝大部分操作都适用于超级表,语法完全一样。 In IoT use cases, there are always multiple data collecting points of same kind. A new concept, called STable (abbreviated for super table), is used in TDengine to represent a kind of data collecting points, and a table is used to represent a specific data collecting point. Tags are used by TDengine to represent the static properties of data collecting points. A specific data collecting point has its own values for static properties. By specifying filter conditions on tags, join query can be performed efficiently between all the tables belonging to same stable, i.e. same kind of data collecting points, can be. Aggregate functions applicable for tables can be used directly on stables, syntax is exactly same.
### 示例一 ### Example 1
在 TAOS Shell,查找北京所有智能电表采集的电压平均值,并按照 location 分组。 In TDengine CLI `taos`, use below SQL to get the average voltage of all the meters in BeiJing grouped by location.
``` ```
taos> SELECT AVG(voltage) FROM meters GROUP BY location; taos> SELECT AVG(voltage) FROM meters GROUP BY location;
...@@ -61,9 +68,9 @@ taos> SELECT AVG(voltage) FROM meters GROUP BY location; ...@@ -61,9 +68,9 @@ taos> SELECT AVG(voltage) FROM meters GROUP BY location;
Query OK, 2 row(s) in set (0.002136s) Query OK, 2 row(s) in set (0.002136s)
``` ```
### 示例二 ### Example 2
在 TAOS shell, 查找 groupId 为 2 的所有智能电表过去 24 小时的记录条数,电流的最大值。 In TDengine CLI `taos`, use below SQL to get the number of rows and the maximum current in the past 24 hours from meters whose groupId is 2.
``` ```
taos> SELECT count(*), max(current) FROM meters where groupId = 2 and ts > now - 24h; taos> SELECT count(*), max(current) FROM meters where groupId = 2 and ts > now - 24h;
...@@ -73,11 +80,11 @@ taos> SELECT count(*), max(current) FROM meters where groupId = 2 and ts > now - ...@@ -73,11 +80,11 @@ taos> SELECT count(*), max(current) FROM meters where groupId = 2 and ts > now -
Query OK, 1 row(s) in set (0.002136s) Query OK, 1 row(s) in set (0.002136s)
``` ```
TDengine 仅容许对属于同一个超级表的表之间进行聚合查询,不同超级表之间的聚合查询不支持。在 [TAOS SQL 的数据查询](/taos-sql/select) 一章,查询类操作都会注明是否支持超级表。 Join query is allowed between only the tables of same stable. In [Select](/taos-sql/select), all query operations are marked as whether it supports stable or not.
## 降采样查询、插值 ## Down Sampling and Interpolation
物联网场景里,经常需要通过降采样(down sampling)将采集的数据按时间段进行聚合。TDengine 提供了一个简便的关键词 interval 让按照时间窗口的查询操作变得极为简单。比如,将智能电表 d1001 采集的电流值每 10 秒钟求和 In IoT use cases, down sampling is widely used to aggregate the data by time range. `INTERVAL` keyword in TDengine can be used to simplify the query by time window. For example, below SQL statement can be used to get the sum of current every 10 seconds from meters table d1001.
``` ```
taos> SELECT sum(current) FROM d1001 INTERVAL(10s); taos> SELECT sum(current) FROM d1001 INTERVAL(10s);
...@@ -88,7 +95,7 @@ taos> SELECT sum(current) FROM d1001 INTERVAL(10s); ...@@ -88,7 +95,7 @@ taos> SELECT sum(current) FROM d1001 INTERVAL(10s);
Query OK, 2 row(s) in set (0.000883s) Query OK, 2 row(s) in set (0.000883s)
``` ```
降采样操作也适用于超级表,比如:将北京所有智能电表采集的电流值每秒钟求和 Down sampling can also be used for stable. For example, below SQL statement can be used to get the sum of current from all meters in BeiJing.
``` ```
taos> SELECT SUM(current) FROM meters where location like "Beijing%" INTERVAL(1s); taos> SELECT SUM(current) FROM meters where location like "Beijing%" INTERVAL(1s);
...@@ -102,7 +109,7 @@ taos> SELECT SUM(current) FROM meters where location like "Beijing%" INTERVAL(1s ...@@ -102,7 +109,7 @@ taos> SELECT SUM(current) FROM meters where location like "Beijing%" INTERVAL(1s
Query OK, 5 row(s) in set (0.001538s) Query OK, 5 row(s) in set (0.001538s)
``` ```
降采样操作也支持时间偏移,比如:将所有智能电表采集的电流值每秒钟求和,但要求每个时间窗口从 500 毫秒开始 Down sample also supports time offset. For example, below SQL statement can be used to get the sum of current from all meters but each time window must start at the boundary of 500 milliseconds.
``` ```
taos> SELECT SUM(current) FROM meters INTERVAL(1s, 500a); taos> SELECT SUM(current) FROM meters INTERVAL(1s, 500a);
...@@ -116,17 +123,17 @@ taos> SELECT SUM(current) FROM meters INTERVAL(1s, 500a); ...@@ -116,17 +123,17 @@ taos> SELECT SUM(current) FROM meters INTERVAL(1s, 500a);
Query OK, 5 row(s) in set (0.001521s) Query OK, 5 row(s) in set (0.001521s)
``` ```
物联网场景里,每个数据采集点采集数据的时间是难同步的,但很多分析算法(比如 FFT)需要把采集的数据严格按照时间等间隔的对齐,在很多系统里,需要应用自己写程序来处理,但使用 TDengine 的降采样操作就轻松解决。 In IoT use cases, it's hard to align the timestamp of the data collected by each collecting point. However, a lot of algorithms like FFT require the data to be aligned with same time interval and application programs have to handle by themselves in many systems. In TDengine, it's easy to achieve the alignment using down sampling.
如果一个时间间隔里,没有采集的数据,TDengine 还提供插值计算的功能。 Interpolation can be performed in TDengine if there is no data in a time range.
语法规则细节请见 [TAOS SQL 的按时间窗口切分聚合](/taos-sql/interval) 章节。 For more details please refer to [Aggregate by Window](/taos-sql/interval).
## 示例代码 ## Examples
### 查询数据 ### Query
在 [SQL 写入](/develop/insert-data/sql-writing) 一章,我们创建了 power 数据库,并向 meters 表写入了一些数据,以下示例代码展示如何查询这个表的数据。 In the section describing [Insert](/develop/insert-data/sql-writing), a database named `power` is created and some data are inserted into stable `meters`. Below sample code demonstrates how to query the data in this stable.
<Tabs defaultValue="java" groupId="lang"> <Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java"> <TabItem label="Java" value="java">
...@@ -154,16 +161,16 @@ Query OK, 5 row(s) in set (0.001521s) ...@@ -154,16 +161,16 @@ Query OK, 5 row(s) in set (0.001521s)
:::note :::note
1. 无论是使用 REST 连接还是原生连接的连接器,以上示例代码都能正常工作。 1. With either REST connection or native connection, the above sample code work well.
2. 唯一需要注意的是:由于 RESTful 接口无状态, 不能使用 `use db` 语句来切换数据库。 2. Please be noted that `use db` can't be used in case of REST connection because it's stateless.
::: :::
### 异步查询 ### Asynchronous Query
除同步查询 API 之外,TDengine 还提供性能更高的异步调用 API 处理数据插入、查询操作。在软硬件环境相同的情况下,异步 API 处理数据插入的速度比同步 API 快 2-4 倍。异步 API 采用非阻塞式的调用方式,在系统真正完成某个具体数据库操作前,立即返回。调用的线程可以去处理其他工作,从而可以提升整个应用的性能。异步 API 在网络延迟严重的情况下,优点尤为突出。 Besides synchronous query, asynchronous query API is also provided by TDengine to insert or query data more efficiently. With similar hardware and software environment, async API is 2~4 times faster than sync APIs. Async API works in non-blocking mode, which means an operation can be returned without finishing so that the calling thread can switch to other works to improve the performance of the whole application system. Async APIs perform especially better in case of poor network.
需要注意的是,只有使用原生连接的连接器,才能使用异步查询功能。 Please be noted that async query can only be used with native connection.
<Tabs defaultValue="python" groupId="lang"> <Tabs defaultValue="python" groupId="lang">
<TabItem label="Python" value="python"> <TabItem label="Python" value="python">
......
--- ---
sidebar_label: 连续查询 sidebar_label: Continuous Query
description: "连续查询是一个按照预设频率自动执行的查询功能,提供按照时间窗口的聚合查询能力,是一种简化的时间驱动流式计算。" description: "Continuous query is a query that's executed automatically according to predefined frequency to provide aggregate query capability by time window, it's actually a simplified time driven stream computing."
title: "连续查询(Continuous Query)" title: "Continuous Query"
--- ---
连续查询是 TDengine 定期自动执行的查询,采用滑动窗口的方式进行计算,是一种简化的时间驱动的流式计算。针对库中的表或超级表,TDengine 可提供定期自动执行的连续查询,用户可让 TDengine 推送查询的结果,也可以将结果再写回到 TDengine 中。每次执行的查询是一个时间窗口,时间窗口随着时间流动向前滑动。在定义连续查询的时候需要指定时间窗口(time window, 参数 interval)大小和每次前向增量时间(forward sliding times, 参数 sliding)。 Continuous query is a query that's executed automatically according to predefined frequency to provide aggregate query capability by time window, it's actually a simplified time driven stream computing. Continuous query can be performed on a table or stable in TDengine. The result of continuous query can be pushed to client or written back to TDengine. Each query is executed on a time window, which moves forward with time. The size of time window and the forward sliding time need to be specified with parameter `INTERVAL` and `SLIDING` respectively.
TDengine 的连续查询采用时间驱动模式,可以直接使用 TAOS SQL 进行定义,不需要额外的操作。使用连续查询,可以方便快捷地按照时间窗口生成结果,从而对原始采集数据进行降采样(down sampling)。用户通过 TAOS SQL 定义连续查询以后,TDengine 自动在最后的一个完整的时间周期末端拉起查询,并将计算获得的结果推送给用户或者写回 TDengine。 Continuous query in TDengine is time driven, and can be defined using TAOS SQL directly without any extra operations. With continuous query, the result can be generated according to time window to achieve down sampling of original data. Once a continuous query is defined using TAOS SQL, the query is automatically executed at the end of each time window and the result is pushed back to client or written to TDengine.
TDengine 提供的连续查询与普通流计算中的时间窗口计算具有以下区别 There are some differences between continuous query in TDengine and time window computation in stream computing
- 不同于流计算的实时反馈计算结果,连续查询只在时间窗口关闭以后才开始计算。例如时间周期是 1 天,那么当天的结果只会在 23:59:59 以后才会生成。 - The computation is performed and the result is returned in real time in stream computing, but the computation in continuous query is only started when a time window closes. For example, if the time window is 1 day, then the result will only be generated at 23:59:59.
- 如果有历史记录写入到已经计算完成的时间区间,连续查询并不会重新进行计算,也不会重新将结果推送给用户。对于写回 TDengine 的模式,也不会更新已经存在的计算结果。 - If a historical data row is written in to a time widow for which the computation has been finished, the computation will not be performed again and the result will not be pushed to client again either. If the result has been written into TDengine, there will be no update for the result.
- 使用连续查询推送结果的模式,服务端并不缓存客户端计算状态,也不提供 Exactly-Once 的语义保证。如果用户的应用端崩溃,再次拉起的连续查询将只会从再次拉起的时间开始重新计算最近的一个完整的时间窗口。如果使用写回模式,TDengine 可确保数据写回的有效性和连续性。 - In continuous query, if the result is pushed to client, the client status is not cached on the server side and Exactly-once is not guaranteed by the server either. If the client program crashes, a new time window will be generated from the time where the continuous query is restarted. If the result is written into TDengine, the data written into TDengine can be guaranteed as valid and continuous.
## 连续查询语法 ## Syntax
```sql ```sql
[CREATE TABLE AS] SELECT select_expr [, select_expr ...] [CREATE TABLE AS] SELECT select_expr [, select_expr ...]
...@@ -24,40 +24,39 @@ TDengine 提供的连续查询与普通流计算中的时间窗口计算具有 ...@@ -24,40 +24,39 @@ TDengine 提供的连续查询与普通流计算中的时间窗口计算具有
``` ```
INTERVAL: 连续查询作用的时间窗口 INTERVAL: The time window for which continuous query is performed
SLIDING: 连续查询的时间窗口向前滑动的时间间隔 SLIDING: The time step for which the time window moves forward each time
## 使用连续查询 ## How to Use
下面以智能电表场景为例介绍连续查询的具体使用方法。假设我们通过下列 SQL 语句创建了超级表和子表: In this section the use case of meters will be used to introduce how to use continuous query. Assume the stable and sub tables have been created using below SQL statement.
```sql ```sql
create table meters (ts timestamp, current float, voltage int, phase float) tags (location binary(64), groupId int); create table meters (ts timestamp, current float, voltage int, phase float) tags (location binary(64), groupId int);
create table D1001 using meters tags ("Beijing.Chaoyang", 2); create table D1001 using meters tags ("Beijing.Chaoyang", 2);
create table D1002 using meters tags ("Beijing.Haidian", 2); create table D1002 using meters tags ("Beijing.Haidian", 2);
...
``` ```
可以通过下面这条 SQL 语句以一分钟为时间窗口、30 秒为前向增量统计这些电表的平均电压。 The average voltage for each time window of one minute with 30 seconds as the length of moving forward can be retrieved using below SQL statement.
```sql ```sql
select avg(voltage) from meters interval(1m) sliding(30s); select avg(voltage) from meters interval(1m) sliding(30s);
``` ```
每次执行这条语句,都会重新计算所有数据。 如果需要每隔 30 秒执行一次来增量计算最近一分钟的数据,可以把上面的语句改进成下面的样子,每次使用不同的 `startTime` 并定期执行: Whenever the above SQL statement is executed, all the existing data will be computed again. If the computation needs to be performed every 30 seconds automatically to compute on the data in the past one minute, the above SQL statement needs to be revised as below, in which `{startTime}` stands for the beginning timestamp in the latest time window.
```sql ```sql
select avg(voltage) from meters where ts > {startTime} interval(1m) sliding(30s); select avg(voltage) from meters where ts > {startTime} interval(1m) sliding(30s);
``` ```
这样做没有问题,但 TDengine 提供了更简单的方法,只要在最初的查询语句前面加上 `create table {tableName} as` 就可以了,例如: Another easier way for same purpose is prepend `create table {tableName} as` before the `select`.
```sql ```sql
create table avg_vol as select avg(voltage) from meters interval(1m) sliding(30s); create table avg_vol as select avg(voltage) from meters interval(1m) sliding(30s);
``` ```
会自动创建一个名为 `avg_vol` 的新表,然后每隔 30 秒,TDengine 会增量执行 `as` 后面的 SQL 语句,并将查询结果写入这个表中,用户程序后续只要从 `avg_vol` 中查询数据即可。例如: A table named as `avg_vol` will be created automatically, then every 30 seconds the `select` statement will be executed automatically on the data in the past 1 minutes, i.e. the latest time window, and the result is written into table `avg_vol`. The client program just needs to query from table `avg_vol`. For example:
```sql ```sql
taos> select * from avg_vol; taos> select * from avg_vol;
...@@ -69,16 +68,16 @@ taos> select * from avg_vol; ...@@ -69,16 +68,16 @@ taos> select * from avg_vol;
2020-07-29 13:39:00.000 | 223.0800000 | 2020-07-29 13:39:00.000 | 223.0800000 |
``` ```
需要注意,查询时间窗口的最小值是 10 毫秒,没有时间窗口范围的上限。 Please be noted that the minimum allowed time window is 10 milliseconds, and no upper limit.
此外,TDengine 还支持用户指定连续查询的起止时间。如果不输入开始时间,连续查询将从第一条原始数据所在的时间窗口开始;如果没有输入结束时间,连续查询将永久运行;如果用户指定了结束时间,连续查询在系统时间达到指定的时间以后停止运行。比如使用下面的 SQL 创建的连续查询将运行一小时,之后会自动停止。 Besides, it's allowed to specify the start and end time of continuous query. If the start time is not specified, the timestamp of the first original row will be considered as the start time; if the end time is not specified, the continuous will be performed infinitely, otherwise it will be terminated once the end time is reached. For example, the continuous query in below SQL statement will be started from now and terminated one hour later.
```sql ```sql
create table avg_vol as select avg(voltage) from meters where ts > now and ts <= now + 1h interval(1m) sliding(30s); create table avg_vol as select avg(voltage) from meters where ts > now and ts <= now + 1h interval(1m) sliding(30s);
``` ```
需要说明的是,上面例子中的 `now` 是指创建连续查询的时间,而不是查询执行的时间,否则,查询就无法自动停止了。另外,为了尽量避免原始数据延迟写入导致的问题,TDengine 中连续查询的计算有一定的延迟。也就是说,一个时间窗口过去后,TDengine 并不会立即计算这个窗口的数据,所以要稍等一会(一般不会超过 1 分钟)才能查到计算结果。 `now` in above SQL statement stands for the time when the continuous query is created, not the time when the computation is actually performed. Besides, to avoid the trouble caused by the delay of original data as much as possible, the actual computation in continuous query is also started with a little delay. That means, once a time window closes, the computation is not started immediately. Normally, the result can only be available a little time later, normally within one minute, after the time window closes.
## 管理连续查询 ## How to Manage
用户可在控制台中通过 `show streams` 命令来查看系统中全部运行的连续查询,并可以通过 `kill stream` 命令杀掉对应的连续查询。后续版本会提供更细粒度和便捷的连续查询管理命令。 `show streams` command can be used in TDengine CLI `taos` to show all the continuous queries in the system, and `kill stream` can be used to terminate a continuous query.
--- ---
sidebar_label: 数据订阅 sidebar_label: Subscription
description: "轻量级的数据订阅与推送服务。连续写入到 TDengine 中的时序数据能够被自动推送到订阅客户端。" description: "Lightweight service for data subscription and pushing, the time series data inserted into TDengine continuously can be pushed automatically to the subscribing clients."
title: 数据订阅 title: Data Subscription
--- ---
import Tabs from "@theme/Tabs"; import Tabs from "@theme/Tabs";
...@@ -14,13 +14,13 @@ import Node from "./_sub_node.mdx"; ...@@ -14,13 +14,13 @@ import Node from "./_sub_node.mdx";
import CSharp from "./_sub_cs.mdx"; import CSharp from "./_sub_cs.mdx";
import CDemo from "./_sub_c.mdx"; import CDemo from "./_sub_c.mdx";
基于数据天然的时间序列特性,TDengine 的数据写入(insert)与消息系统的数据发布(pub)逻辑上一致,均可视为系统中插入一条带时间戳的新记录。同时,TDengine 在内部严格按照数据时间序列单调递增的方式保存数据。本质上来说,TDengine 中每一张表均可视为一个标准的消息队列。 ## Introduction
TDengine 内嵌支持轻量级的消息订阅与推送服务。使用系统提供的 API,用户可使用普通查询语句订阅数据库中的一张或多张表。订阅的逻辑和操作状态的维护均是由客户端完成,客户端定时轮询服务器是否有新的记录到达,有新的记录到达就会将结果反馈到客户。 According to the time series nature of the data, data inserting in TDengine is similar to data publishing in message queues, they both can be considered as a new data record with timestamp is inserted into the system. Data is stored in ascending order of timestamp inside TDengine, so essentially each table in TDengine can be considered as a message queue.
TDengine 的订阅与推送服务的状态是由客户端维持,TDengine 服务端并不维持。因此如果应用重启,从哪个时间点开始获取最新数据,由应用决定。 Lightweight service for data subscription and pushing is built in TDengine. With the API provided by TDengine, client programs can used `select` statement to subscribe the data from one or more tables. The subscription and and state maintenance is performed on the client side, the client programs polls the server to check whether there is new data, and if so the new data will be pushed back to the client side. If the client program is restarted, where to start for retrieving new data is up to the client side.
TDengine 的 API 中,与订阅相关的主要有以下三个: There are 3 major APIs related to subscription provided in the TDengine client driver.
```c ```c
taos_subscribe taos_subscribe
...@@ -28,9 +28,11 @@ taos_consume ...@@ -28,9 +28,11 @@ taos_consume
taos_unsubscribe taos_unsubscribe
``` ```
这些 API 的文档请见 [C/C++ Connector](/reference/connector/cpp),下面仍以智能电表场景为例介绍一下它们的具体用法(超级表和子表结构请参考上一节“连续查询”),完整的示例代码可以在 [这里](https://github.com/taosdata/TDengine/blob/master/examples/c/subscribe.c) 找到。 For more details about these API please refer to [C/C++ Connector](/reference/connector/cpp). Their usage will be introduced below using the use case of meters, in which the schema of stable and sub tables please refer to the previous section "continuous query". Full sample code can be found [here](https://github.com/taosdata/TDengine/blob/master/examples/c/subscribe.c).
如果我们希望当某个电表的电流超过一定限制(比如 10A)后能得到通知并进行一些处理, 有两种方法:一是分别对每张子表进行查询,每次查询后记录最后一条数据的时间戳,后续只查询这个时间戳之后的数据: If we want to get notification and take some actions if the current exceeds a threshold, like 10A, from some meters, there are two ways:
The first way is to query on each sub table and record the last timestamp matching the criteria, then after some time query on the data later than recorded timestamp and repeat this process. The SQL statements for this way are as below.
```sql ```sql
select * from D1001 where ts > {last_timestamp1} and current > 10; select * from D1001 where ts > {last_timestamp1} and current > 10;
...@@ -38,19 +40,19 @@ select * from D1002 where ts > {last_timestamp2} and current > 10; ...@@ -38,19 +40,19 @@ select * from D1002 where ts > {last_timestamp2} and current > 10;
... ...
``` ```
这确实可行,但随着电表数量的增加,查询数量也会增加,客户端和服务端的性能都会受到影响,当电表数增长到一定的程度,系统就无法承受了。 The above way works, but the problem is that the number of `select` statements increases with the number of meters grows. Finally the performance of both client side and server side will be unacceptable once the number of meters grows to a big enough number.
另一种方法是对超级表进行查询。这样,无论有多少电表,都只需一次查询: A better way is to query on the stable, only one `select` is enough regardless of the number of meters, like below:
```sql ```sql
select * from meters where ts > {last_timestamp} and current > 10; select * from meters where ts > {last_timestamp} and current > 10;
``` ```
但是,如何选择 `last_timestamp` 就成了一个新的问题。因为,一方面数据的产生时间(也就是数据时间戳)和数据入库的时间一般并不相同,有时偏差还很大;另一方面,不同电表的数据到达 TDengine 的时间也会有差异。所以,如果我们在查询中使用最慢的那台电表的数据的时间戳作为 `last_timestamp`,就可能重复读入其它电表的数据;如果使用最快的电表的时间戳,其它电表的数据就可能被漏掉。 However, how to choose `last_timestamp` becomes a new problem if using this way. Firstly, the timestamp when the data is generated is different from the timestamp when the data is inserted into the database, sometimes the difference between them may be very big. Secondly, the time when the data from different meters may arrives at the database may be different too. If the timestamp of the "slowest" meter is used as `last_timestamp` in the query, the data from other meters may be selected repeatedly; but if the timestamp of the "fasted" meters is used as `last_timestamp`, some data from other meters may be missed.
TDengine 的订阅功能为上面这个问题提供了一个彻底的解决方案。 All the problems mentioned above can be resolved thoroughly using subscription provided by TDengine.
首先是使用 `taos_subscribe` 创建订阅: The first step is to create subscription using `taos_subscribe`.
```c ```c
TAOS_SUB* tsub = NULL; TAOS_SUB* tsub = NULL;
...@@ -63,31 +65,31 @@ if (async) { ...@@ -63,31 +65,31 @@ if (async) {
} }
``` ```
TDengine 中的订阅既可以是同步的,也可以是异步的,上面的代码会根据从命令行获取的参数 `async` 的值来决定使用哪种方式。这里,同步的意思是用户程序要直接调用 `taos_consume` 来拉取数据,而异步则由 API 在内部的另一个线程中调用 `taos_consume`,然后把拉取到的数据交给回调函数 `subscribe_callback`去处理。(注意,`subscribe_callback` 中不宜做较为耗时的操作,否则有可能导致客户端阻塞等不可控的问题。) The subscription in TDengine can be either synchronous or asynchronous. In the above sample code, the value of variable `async` is determined from the CLI input, then it's used to create either an async or sync subscription. Sync subscription means the client program needs to invoke `taos_consume` to retrieve data, and async subscription means another thread created by `taos_subscribe` internally invokes `taos_consume` to retrieve data and pass the data to `subscribe_callback` for processing, `subscribe_callback` is a call back function provided by the client program and it's suggested not to do time consuming operation in the call back function.
参数 `taos` 是一个已经建立好的数据库连接,在同步模式下无特殊要求。但在异步模式下,需要注意它不会被其它线程使用,否则可能导致不可预计的错误,因为回调函数在 API 的内部线程中被调用,而 TDengine 的部分 API 不是线程安全的。 The parameter `taos` is an established connection. There is nothing special in sync subscription mode. In async subscription, it should be exclusively by current thread, otherwise unpredictable error may occur.
参数 `sql` 是查询语句,可以在其中使用 where 子句指定过滤条件。在我们的例子中,如果只想订阅电流超过 10A 时的数据,可以这样写: The parameter `sql` is a `select` statement in which `where` clause can be used to specify filter conditions. In our example, the data whose current exceeds 10A needs to be subscribed like below SQL statement:
```sql ```sql
select * from meters where current > 10; select * from meters where current > 10;
``` ```
注意,这里没有指定起始时间,所以会读到所有时间的数据。如果只想从一天前的数据开始订阅,而不需要更早的历史数据,可以再加上一个时间条件: Please be noted that, all the data will be processed because no start time is specified. If only the data from one day ago needs to be processed, a time related condition can be added:
```sql ```sql
select * from meters where ts > now - 1d and current > 10; select * from meters where ts > now - 1d and current > 10;
``` ```
订阅的 `topic` 实际上是它的名字,因为订阅功能是在客户端 API 中实现的,所以没必要保证它全局唯一,但需要它在一台客户端机器上唯一。 The parameter `topic` is the name of the subscription, it needs to be guaranteed unique in the client program, but it's not necessary to be globally unique because subscription is implemented in the APIs on client side.
如果名为 `topic` 的订阅不存在,参数 `restart` 没有意义;但如果用户程序创建这个订阅后退出,当它再次启动并重新使用这个 `topic` 时,`restart` 就会被用于决定是从头开始读取数据,还是接续上次的位置进行读取。本例中,如果 `restart` 是 **true**(非零值),用户程序肯定会读到所有数据。但如果这个订阅之前就存在了,并且已经读取了一部分数据,且 `restart` 是 **false**(**0**),用户程序就不会读到之前已经读取的数据了。 If the subscription named as `topic` doesn't exist, parameter `restart` would be ignored. If the subscription named as `topic` has been created before by the client program which then exited, when the client program is restarted to use this `topic`, parameter `restart` is used to determine retrieving data from beginning or from the last point where the subscription was broken. If the value of `restart` is **true** (i.e. a non-zero value), the data will be retrieved from beginning, or if it is **false** (i.e. zero), the data already consumed before will not be processed again.
`taos_subscribe`的最后一个参数是以毫秒为单位的轮询周期。在同步模式下,如果前后两次调用 `taos_consume` 的时间间隔小于此时间,`taos_consume` 会阻塞,直到间隔超过此时间。异步模式下,这个时间是两次调用回调函数的最小时间间隔。 The last parameter of `taos_subscribe` is the polling interval in unit of millisecond. In sync mode, if the time difference between two continuous invocations to `taos_consume` is smaller than the interval specified by `taos_subscribe`, `taos_consume` would be blocked until the interval is reached. In async mode, this interval is the minimum interval between two invocations to the call back function.
`taos_subscribe` 的倒数第二个参数用于用户程序向回调函数传递附加参数,订阅 API 不对其做任何处理,只原样传递给回调函数。此参数在同步模式下无意义。 The last second parameter of `taos_subscribe` is used to pass arguments to the call back function. `taos_subscribe` doesn't process this parameter and simply passes it to the call back function. This parameter is simply ignored in sync mode.
订阅创建以后,就可以消费其数据了,同步模式下,示例代码是下面的 else 部分: After a subscription is created, its data can be consumed and processed, below is the sample code of how to consume data in sync mode, in the else part if `if (async)`.
```c ```c
if (async) { if (async) {
...@@ -104,7 +106,7 @@ if (async) { ...@@ -104,7 +106,7 @@ if (async) {
} }
``` ```
这里是一个 **while** 循环,用户每按一次回车键就调用一次 `taos_consume`,而 `taos_consume` 的返回值是查询到的结果集,与 `taos_use_result` 完全相同,例子中使用这个结果集的代码是函数 `print_result`: In the above sample code, there is an infinite loop, each time carriage return is entered `taos_consume` is invoked, the return value of `taos_consume` is the selected result set, exactly as the input of `taos_use_result`, in the above sample `print_result` is used instead to simplify the sample. Below is the implementation of `print_result`.
```c ```c
void print_result(TAOS_RES* res, int blockFetch) { void print_result(TAOS_RES* res, int blockFetch) {
...@@ -131,7 +133,9 @@ void print_result(TAOS_RES* res, int blockFetch) { ...@@ -131,7 +133,9 @@ void print_result(TAOS_RES* res, int blockFetch) {
} }
``` ```
其中的 `taos_print_row` 用于处理订阅到数据,在我们的例子中,它会打印出所有符合条件的记录。而异步模式下,消费订阅到的数据则显得更为简单: In the above code `taos_print_row` is used to process the data consumed. All the matching rows will be printed.
In async mode, the data consuming is simpler as below.
```c ```c
void subscribe_callback(TAOS_SUB* tsub, TAOS_RES *res, void* param, int code) { void subscribe_callback(TAOS_SUB* tsub, TAOS_RES *res, void* param, int code) {
...@@ -139,44 +143,43 @@ void subscribe_callback(TAOS_SUB* tsub, TAOS_RES *res, void* param, int code) { ...@@ -139,44 +143,43 @@ void subscribe_callback(TAOS_SUB* tsub, TAOS_RES *res, void* param, int code) {
} }
``` ```
当要结束一次数据订阅时,需要调用 `taos_unsubscribe`: `taos_unsubscribe` can be invoked to terminate a subscription.
```c ```c
taos_unsubscribe(tsub, keep); taos_unsubscribe(tsub, keep);
``` ```
其第二个参数,用于决定是否在客户端保留订阅的进度信息。如果这个参数是**false**(**0**),那无论下次调用 `taos_subscribe` 时的 `restart` 参数是什么,订阅都只能重新开始。另外,进度信息的保存位置是 _{DataDir}/subscribe/_ 这个目录下,每个订阅有一个与其 `topic` 同名的文件,删掉某个文件,同样会导致下次创建其对应的订阅时只能重新开始。 The second parameter `keep` is used to specify whether to keep the subscription progress on the client sde. If it is **false**, i.e. **0**, then subscription will be restarted from beginning regardless of the `restart` parameter's value in when `taos_subscribe` is invoked again. The subscription progress information is stored in _{DataDir}/subscribe/_ , under which there is a file with same name as `topic` for each subscription, the subscription will be restarted from beginning if the corresponding progress file is removed.
代码介绍完毕,我们来看一下实际的运行效果。假设: Now let's see the effect of the above sample code, assuming below prerequisites have been done.
- 示例代码已经下载到本地 - The sample code has been downloaded to local system 示
- TDengine 也已经在同一台机器上安装好 - TDengine has been installed and launched properly on same system
- 示例所需的数据库、超级表、子表已经全部创建好 - The database, stable, sub tables required in the sample code have been ready
则可以在示例代码所在目录执行以下命令来编译并启动示例程序: It's ready to launch below command in the directory where the sample code resides to compile and start the program.
```bash ```bash
make make
./subscribe -sql='select * from meters where current > 10;' ./subscribe -sql='select * from meters where current > 10;'
``` ```
示例程序启动后,打开另一个终端窗口,启动 TDengine CLI 向 **D1001** 插入一条电流为 12A 的数据: After the program is started, open another terminal and launch TDengine CLI `taos`, then use below SQL commands to insert a row whose current is 12A into table **D1001**.
```sql ```sql
$ taos use test;
> use test; insert into D1001 values(now, 12, 220, 1);
> insert into D1001 values(now, 12, 220, 1);
``` ```
这时,因为电流超过了 10A,您应该可以看到示例程序将它输出到了屏幕上。您可以继续插入一些数据观察示例程序的输出。 Then, this row of data will be shown by the example program on the first terminal because its current exceeds 10A. More data can be inserted for you to observe the output of the example program.
## 示例程序 ## Examples
下面的示例程序展示是如何使用连接器订阅所有电流超过 10A 的记录。 Below example program demonstrates how to subscribe the data rows whose current exceeds 10A using connectors.
### 准备数据 ### Prepare Data
``` ```bash
# create database "power" # create database "power"
taos> create database power; taos> create database power;
# use "power" as the database in following operations # use "power" as the database in following operations
...@@ -200,20 +203,21 @@ taos> select * from meters where current > 10; ...@@ -200,20 +203,21 @@ taos> select * from meters where current > 10;
2020-08-15 12:20:00.000 | 12.20000 | 220 | 1 | Beijing.Chaoyang | 2 | 2020-08-15 12:20:00.000 | 12.20000 | 220 | 1 | Beijing.Chaoyang | 2 |
Query OK, 5 row(s) in set (0.004896s) Query OK, 5 row(s) in set (0.004896s)
``` ```
### 示例代码
### Example Programs
<Tabs defaultValue="java" groupId="lang"> <Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java"> <TabItem label="Java" value="java">
<Java/> <Java />
</TabItem> </TabItem>
<TabItem label="Python" value="Python"> <TabItem label="Python" value="Python">
<Python/> <Python />
</TabItem> </TabItem>
{/* <TabItem label="Go" value="go"> {/* <TabItem label="Go" value="go">
<Go/> <Go/>
</TabItem> */} </TabItem> */}
<TabItem label="Rust" value="rust"> <TabItem label="Rust" value="rust">
<Rust/> <Rust />
</TabItem> </TabItem>
{/* <TabItem label="Node.js" value="nodejs"> {/* <TabItem label="Node.js" value="nodejs">
<Node/> <Node/>
...@@ -222,13 +226,13 @@ Query OK, 5 row(s) in set (0.004896s) ...@@ -222,13 +226,13 @@ Query OK, 5 row(s) in set (0.004896s)
<CSharp/> <CSharp/>
</TabItem> */} </TabItem> */}
<TabItem label="C" value="c"> <TabItem label="C" value="c">
<CDemo/> <CDemo />
</TabItem> </TabItem>
</Tabs> </Tabs>
### 运行示例程序 ### Run the Examples
示例程序会先消费符合查询条件的所有历史数据: The example programs firstly consume all historical data matching the criteria.
```bash ```bash
ts: 1597464000000 current: 12.0 voltage: 220 phase: 1 location: Beijing.Chaoyang groupid : 2 ts: 1597464000000 current: 12.0 voltage: 220 phase: 1 location: Beijing.Chaoyang groupid : 2
...@@ -238,7 +242,7 @@ ts: 1597464600000 current: 10.3 voltage: 220 phase: 1 location: Beijing.Haidian ...@@ -238,7 +242,7 @@ ts: 1597464600000 current: 10.3 voltage: 220 phase: 1 location: Beijing.Haidian
ts: 1597465200000 current: 11.2 voltage: 220 phase: 1 location: Beijing.Haidian groupid : 2 ts: 1597465200000 current: 11.2 voltage: 220 phase: 1 location: Beijing.Haidian groupid : 2
``` ```
接着,使用 TDengine CLI 向表中新增一条数据: Next, use TDengine CLI to insert a new row.
``` ```
# taos # taos
...@@ -246,7 +250,7 @@ taos> use power; ...@@ -246,7 +250,7 @@ taos> use power;
taos> insert into d1001 values(now, 12.4, 220, 1); taos> insert into d1001 values(now, 12.4, 220, 1);
``` ```
因为这条数据的电流大于 10A,示例程序会将其消费: Because the current in inserted row exceeds 10A, it will be consumed by the example program.
``` ```
ts: 1651146662805 current: 12.4 voltage: 220 phase: 1 location: Beijing.Chaoyang groupid: 2 ts: 1651146662805 current: 12.4 voltage: 220 phase: 1 location: Beijing.Chaoyang groupid: 2
......
--- ---
sidebar_label: 缓存 sidebar_label: Cache
title: 缓存 title: Cache
description: "提供写驱动的缓存管理机制,将每个表最近写入的一条记录持续保存在缓存中,可以提供高性能的最近状态查询。" description: "The latest row of each table is kept in cache to provide high performance query of latest state."
--- ---
TDengine 采用时间驱动缓存管理策略(First-In-First-Out,FIFO),又称为写驱动的缓存管理机制。这种策略有别于读驱动的数据缓存模式(Least-Recent-Used,LRU),直接将最近写入的数据保存在系统的缓存中。当缓存达到临界值的时候,将最早的数据批量写入磁盘。一般意义上来说,对于物联网数据的使用,用户最为关心最近产生的数据,即当前状态。TDengine 充分利用了这一特性,将最近到达的(当前状态)数据保存在缓存中。 The cache management policy in TDengine is First-In-First-Out (FIFO), which is also known as insert driven cache management policy and different from read driven cache management, i.e. Least-Recent-Used (LRU). It simply stores the latest data in cache and flushes the oldest data in cache to disk when the cache usage reaches a threshold. In IoT use cases, the most cared about data is the latest data, i.e. current state. The cache policy in TDengine is based the nature of IoT data.
TDengine 通过查询函数向用户提供毫秒级的数据获取能力。直接将最近到达的数据保存在缓存中,可以更加快速地响应用户针对最近一条或一批数据的查询分析,整体上提供更快的数据库查询响应能力。从这个意义上来说,可通过设置合适的配置参数将 TDengine 作为数据缓存来使用,而不需要再部署额外的缓存系统,可有效地简化系统架构,降低运维的成本。需要注意的是,TDengine 重启以后系统的缓存将被清空,之前缓存的数据均会被批量写入磁盘,缓存的数据将不会像专门的 key-value 缓存系统再将之前缓存的数据重新加载到缓存中。 Caching the latest data provides the capability of retrieving data in milliseconds. With this capability, TDengine can be configured properly to be used as caching system without deploying another separate caching system to simplify the system architecture and minimize the operation cost. The cache will be emptied after TDengine is restarted, TDengine doesn't reload data from disk into cache like a real key-value caching system.
TDengine 分配固定大小的内存空间作为缓存空间,缓存空间可根据应用的需求和硬件资源配置。通过适当的设置缓存空间,TDengine 可以提供极高性能的写入和查询的支持。TDengine 中每个虚拟节点(virtual node)创建时分配独立的缓存池。每个虚拟节点管理自己的缓存池,不同虚拟节点间不共享缓存池。每个虚拟节点内部所属的全部表共享该虚拟节点的缓存池。 The memory space used by TDengine cache is fixed in size, according to the configuration based on application requirement and system resources. Independent memory pool is allocated for and managed by each vnode (virtual node) in TDengine, there is no sharing of memory pools between vnodes. All the tables belonging to a vnode share all the cache memory of the vnode.
TDengine 将内存池按块划分进行管理,数据在内存块里是以行(row)的形式存储。一个 vnode 的内存池是在 vnode 创建时按块分配好,而且每个内存块按照先进先出的原则进行管理。在创建内存池时,块的大小由系统配置参数 cache 决定;每个 vnode 中内存块的数目则由配置参数 blocks 决定。因此对于一个 vnode,总的内存大小为:`cache * blocks`。一个 cache block 需要保证每张表能存储至少几十条以上记录,才会有效率。 Memory pool is divided into blocks and data is stored in row format in memory and each block follows FIFO policy. The size of each block is determined by configuration parameter `cache`, the number of blocks for each vnode is determined by `blocks`. For each vnode, the total cache size is `cache * blocks`. It's better to set the size of each block to hold at least tends of rows.
你可以通过函数 last_row() 快速获取一张表或一张超级表的最后一条记录,这样很便于在大屏显示各设备的实时状态或采集值。例如: `last_row` function can be used to retrieve the last row of a table or a stable to quickly show the current state of devices on monitoring screen. For example below SQL statement retrieves the latest voltage of all meters in Chaoyang district of Beijing.
```sql ```sql
select last_row(voltage) from meters where location='Beijing.Chaoyang'; select last_row(voltage) from meters where location='Beijing.Chaoyang';
``` ```
该 SQL 语句将获取所有位于北京朝阳区的电表最后记录的电压值。
此差异已折叠。
...@@ -2,4 +2,4 @@ label: Develop ...@@ -2,4 +2,4 @@ label: Develop
link: link:
type: generated-index type: generated-index
slug: /develop slug: /develop
description: "开始指南是对开发者友好的使用教程,既包括数据建模、写入、查询等基础功能的使用,也包括数据订阅、连续查询等高级功能的使用。对于每个主题,都配有各编程语言的连接器的示例代码,方便开发者快速上手。如果想更深入地了解各连接器的使用,请阅读连接器参考指南。" description: "The guide is for developers to quickly learn about the functionalities of TDengine, including fundamentals like data model, inserting data, query and advanced features like data subscription, continuous query. For each functionality, sample code of multiple programming languages are provided for developers to get started quickly."
...@@ -2,6 +2,6 @@ ...@@ -2,6 +2,6 @@
{{#include docs-examples/java/src/main/java/com/taos/example/SubscribeDemo.java}} {{#include docs-examples/java/src/main/java/com/taos/example/SubscribeDemo.java}}
``` ```
:::note :::note
目前 Java 接口没有提供异步订阅模式,但用户程序可以通过创建 `TimerTask` 等方式达到同样的效果。 For now Java connector doesn't provide asynchronous subscription, but `TimerTask` can be used to achieve similar purpose.
::: :::
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册