提交 34852ab1 编写于 作者: S StoneT2000

Uploaded markdown docs for testing syncing with web docs

上级 a89b2804
#与其他工具的连接
## Telegraf
TDengine能够与开源数据采集系统[Telegraf](https://www.influxdata.com/time-series-platform/telegraf/)快速集成,整个过程无需任何代码开发。
###安装Telegraf
目前TDengine支持Telegraf 1.7.4以上的版本。用户可以根据当前的操作系统,到Telegraf官网下载安装包,并执行安装。下载地址如下:https://portal.influxdata.com/downloads
###配置Telegraf
修改Telegraf配置文件/etc/telegraf/telegraf.conf中与TDengine有关的配置项。
在output plugins部分,增加[[outputs.http]]配置项:
- url:http://ip:6020/telegraf/udb,其中ip为TDengine集群的中任意一台服务器的IP地址,6020为TDengine RESTful接口的端口号,telegraf为固定关键字,udb为用于存储采集数据的数据库名称,可预先创建。
- method: "POST"
- username: 登录TDengine的用户名
- password: 登录TDengine的密码
- data_format: "json"
- json_timestamp_units: "1ms"
在agent部分:
- hostname: 区分不同采集设备的机器名称,需确保其唯一性
- metric_batch_size: 30,允许Telegraf每批次写入记录最大数量,增大其数量可以降低Telegraf的请求发送频率,但对于TDengine,该数值不能超过50
关于如何使用Telegraf采集数据以及更多有关使用Telegraf的信息,请参考Telegraf官方的[文档](https://docs.influxdata.com/telegraf/v1.11/)
## Grafana
TDengine能够与开源数据可视化系统[Grafana](https://www.grafana.com/)快速集成搭建数据监测报警系统,整个过程无需任何代码开发,TDengine中数据表中内容可以在仪表盘(DashBoard)上进行可视化展现。
###安装Grafana
目前TDengine支持Grafana 5.2.4以上的版本。用户可以根据当前的操作系统,到Grafana官网下载安装包,并执行安装。下载地址如下:https://grafana.com/grafana/download
###配置Grafana
TDengine的Grafana插件在安装包的/usr/local/taos/connector/grafana目录下。
以CentOS 7.2操作系统为例,将tdengine目录拷贝到/var/lib/grafana/plugins目录下,重新启动grafana即可。
###使用Grafana
用户可以直接通过localhost:3000的网址,登录Grafana服务器(用户名/密码:admin/admin),配置TDengine数据源,如下图所示,此时可以在下拉列表中看到TDengine数据源。
![img](../assets/clip_image001.png)
TDengine数据源中的HTTP配置里面的Host地址要设置为TDengine集群的中任意一台服务器的IP地址与TDengine RESTful接口的端口号(6020)。假设TDengine数据库与Grafana部署在同一机器,那么应输入:http://localhost:6020。
此外,还需配置登录TDengine的用户名与密码,然后点击下图中的Save&Test按钮保存。
![img](../assets/clip_image001-2474914.png)
然后,就可以在Grafana的数据源列表中看到刚创建好的TDengine的数据源:
![img](../assets/clip_image001-2474939.png)
基于上面的步骤,就可以在创建Dashboard的时候使用TDengine数据源,如下图所示:
![img](../assets/clip_image001-2474961.png)
然后,可以点击Add Query按钮增加一个新查询。
在INPUT SQL输入框中输入查询SQL语句,该SQL语句的结果集应为两行多列的曲线数据,例如SELECT count(*) FROM sys.cpu WHERE ts>=from and ts<​to interval(interval)。其中,from、to和interval为TDengine插件的内置变量,表示从Grafana插件面板获取的查询范围和时间间隔。
ALIAS BY输入框为查询的别名,点击GENERATE SQL 按钮可以获取发送给TDengine的SQL语句。如下图所示:
![img](../assets/clip_image001-2474987.png)
关于如何使用Grafana创建相应的监测界面以及更多有关使用Grafana的信息,请参考Grafana官方的[文档](https://grafana.com/docs/)
## Matlab
MatLab可以通过安装包内提供的JDBC Driver直接连接到TDengine获取数据到本地工作空间。
###MatLab的JDBC接口适配
MatLab的适配有下面几个步骤,下面以Windows10上适配MatLab2017a为例:
- 将TDengine安装包内的驱动程序JDBCDriver-1.0.0-dist.jar拷贝到${matlab_root}\MATLAB\R2017a\java\jar\toolbox
- 将TDengine安装包内的taos.lib文件拷贝至${matlab_ root _dir}\MATLAB\R2017a\lib\win64
- 将新添加的驱动jar包加入MatLab的classpath。在${matlab_ root _dir}\MATLAB\R2017a\toolbox\local\classpath.txt文件中添加下面一行
`$matlabroot/java/jar/toolbox/JDBCDriver-1.0.0-dist.jar`
- 在${user_home}\AppData\Roaming\MathWorks\MATLAB\R2017a\下添加一个文件javalibrarypath.txt, 并在该文件中添加taos.dll的路径,比如您的taos.dll是在安装时拷贝到了C:\Windows\System32下,那么就应该在javalibrarypath.txt中添加如下一行:
`C:\Windows\System32`
###在MatLab中连接TDengine获取数据
在成功进行了上述配置后,打开MatLab。
- 创建一个连接:
`conn = database(‘db’, ‘root’, ‘taosdata’, ‘com.taosdata.jdbc.TSDBDriver’, ‘jdbc:TSDB://127.0.0.1:0/’)`
- 执行一次查询:
`sql0 = [‘select * from tb’]`
`data = select(conn, sql0);`
- 插入一条记录:
`sql1 = [‘insert into tb values (now, 1)’]`
`exec(conn, sql1)`
更多例子细节请参考安装包内examples\Matlab\TDengineDemo.m文件。
## R
R语言支持通过JDBC接口来连接TDengine数据库。首先需要安装R语言的JDBC包。启动R语言环境,然后执行以下命令安装R语言的JDBC支持库:
```R
install.packages('rJDBC', repos='http://cran.us.r-project.org')
```
安装完成以后,通过执行`library('RJDBC')`命令加载 _RJDBC_ 包:
然后加载TDengine的JDBC驱动:
```R
drv<-JDBC("com.taosdata.jdbc.TSDBDriver","JDBCDriver-1.0.0-dist.jar", identifier.quote="\"")
```
如果执行成功,不会出现任何错误信息。之后通过以下命令尝试连接数据库:
```R
conn<-dbConnect(drv,"jdbc:TSDB://192.168.0.1:0/?user=root&password=taosdata","root","taosdata")
```
注意将上述命令中的IP地址替换成正确的IP地址。如果没有任务错误的信息,则连接数据库成功,否则需要根据错误提示调整连接的命令。TDengine支持以下的 _RJDBC_ 包中函数:
- dbWriteTable(conn, "test", iris, overwrite=FALSE, append=TRUE):将数据框iris写入表test中,overwrite必须设置为false,append必须设为TRUE,且数据框iris要与表test的结构一致。
- dbGetQuery(conn, "select count(*) from test"):查询语句
- dbSendUpdate(conn, "use db"):执行任何非查询sql语句。例如dbSendUpdate(conn, "use db"), 写入数据dbSendUpdate(conn, "insert into t1 values(now, 99)")等。
- dbReadTable(conn, "test"):读取表test中数据
- dbDisconnect(conn):关闭连接
- dbRemoveTable(conn, "test"):删除表test
TDengine客户端暂不支持如下函数:
- dbExistsTable(conn, "test"):是否存在表test
- dbListTables(conn):显示连接中的所有表
# Connect with other tools
## Telegraf
TDengine is easy to integrate with [Telegraf](https://www.influxdata.com/time-series-platform/telegraf/), an open-source server agent for collecting and sending metrics and events, without more development.
### Install Telegraf
At present, TDengine supports Telegraf newer than version 1.7.4. Users can go to the [download link] and choose the proper package to install on your system.
### Configure Telegraf
Telegraf is configured by changing items in the configuration file */etc/telegraf/telegraf.conf*.
In **output plugins** section,add _[[outputs.http]]_ iterm:
- _url_: http://ip:6020/telegraf/udb, in which _ip_ is the IP address of any node in TDengine cluster. Port 6020 is the RESTful APT port used by TDengine. _udb_ is the name of the database to save data, which needs to create beforehand.
- _method_: "POST"
- _username_: username to login TDengine
- _password_: password to login TDengine
- _data_format_: "json"
- _json_timestamp_units_: "1ms"
In **agent** part:
- hostname: used to distinguish different machines. Need to be unique.
- metric_batch_size: 30,the maximum number of records allowed to write in Telegraf. The larger the value is, the less frequent requests are sent. For TDengine, the value should be less than 50.
Please refer to the [Telegraf docs](https://docs.influxdata.com/telegraf/v1.11/) for more information.
## Grafana
[Grafana] is an open-source system for time-series data display. It is easy to integrate TDengine and Grafana to build a monitor system. Data saved in TDengine can be fetched and shown on the Grafana dashboard.
### Install Grafana
For now, TDengine only supports Grafana newer than version 5.2.4. Users can go to the [Grafana download page] for the proper package to download.
### Configure Grafana
TDengine Grafana plugin is in the _/usr/local/taos/connector/grafana_ directory.
Taking Centos 7.2 as an example, just copy TDengine directory to _/var/lib/grafana/plugins_ directory and restart Grafana.
### Use Grafana
Users can log in the Grafana server (username/password:admin/admin) through localhost:3000 to configure TDengine as the data source. As is shown in the picture below, TDengine as a data source option is shown in the box:
![img](../assets/clip_image001.png)
When choosing TDengine as the data source, the Host in HTTP configuration should be configured as the IP address of any node of a TDengine cluster. The port should be set as 6020. For example, when TDengine and Grafana are on the same machine, it should be configured as _http://localhost:6020.
Besides, users also should set the username and password used to log into TDengine. Then click _Save&Test_ button to save.
![img](../assets/clip_image001-2474914.png)
Then, TDengine as a data source should show in the Grafana data source list.
![img](../assets/clip_image001-2474939.png)
Then, users can create Dashboards in Grafana using TDengine as the data source:
![img](../assets/clip_image001-2474961.png)
Click _Add Query_ button to add a query and input the SQL command you want to run in the _INPUT SQL_ text box. The SQL command should expect a two-row, multi-column result, such as _SELECT count(*) FROM sys.cpu WHERE ts>=from and ts<​to interval(interval)_, in which, _from_, _to_ and _inteval_ are TDengine inner variables representing query time range and time interval.
_ALIAS BY_ field is to set the query alias. Click _GENERATE SQL_ to send the command to TDengine:
![img](../assets/clip_image001-2474987.png)
Please refer to the [Grafana official document] for more information about Grafana.
## Matlab
Matlab can connect to and retrieve data from TDengine by TDengine JDBC Driver.
### MatLab and TDengine JDBC adaptation
Several steps are required to adapt Matlab to TDengine. Taking adapting Matlab2017a on Windows10 as an example:
1. Copy the file _JDBCDriver-1.0.0-dist.jar_ in TDengine package to the directory _${matlab_root}\MATLAB\R2017a\java\jar\toolbox_
2. Copy the file _taos.lib_ in TDengine package to _${matlab_ root _dir}\MATLAB\R2017a\lib\win64_
3. Add the .jar package just copied to the Matlab classpath. Append the line below as the end of the file of _${matlab_ root _dir}\MATLAB\R2017a\toolbox\local\classpath.txt_
`$matlabroot/java/jar/toolbox/JDBCDriver-1.0.0-dist.jar`
4. Create a file called _javalibrarypath.txt_ in directory _${user_home}\AppData\Roaming\MathWorks\MATLAB\R2017a\_, and add the _taos.dll_ path in the file. For example, if the file _taos.dll_ is in the directory of _C:\Windows\System32_,then add the following line in file *javalibrarypath.txt*:
`C:\Windows\System32`
### TDengine operations in Matlab
After correct configuration, open Matlab:
- build a connection:
`conn = database(‘db’, ‘root’, ‘taosdata’, ‘com.taosdata.jdbc.TSDBDriver’, ‘jdbc:TSDB://127.0.0.1:0/’)`
- Query:
`sql0 = [‘select * from tb’]`
`data = select(conn, sql0);`
- Insert a record:
`sql1 = [‘insert into tb values (now, 1)’]`
`exec(conn, sql1)`
Please refer to the file _examples\Matlab\TDengineDemo.m_ for more information.
## R
Users can use R language to access the TDengine server with the JDBC interface. At first, install JDBC package in R:
```R
install.packages('rJDBC', repos='http://cran.us.r-project.org')
```
Then use _library_ function to load the package:
```R
library('RJDBC')
```
Then load the TDengine JDBC driver:
```R
drv<-JDBC("com.taosdata.jdbc.TSDBDriver","JDBCDriver-1.0.0-dist.jar", identifier.quote="\"")
```
If succeed, no error message will display. Then use the following command to try a database connection:
```R
conn<-dbConnect(drv,"jdbc:TSDB://192.168.0.1:0/?user=root&password=taosdata","root","taosdata")
```
Please replace the IP address in the command above to the correct one. If no error message is shown, then the connection is established successfully. TDengine supports below functions in _RJDBC_ package:
- _dbWriteTable(conn, "test", iris, overwrite=FALSE, append=TRUE)_: write the data in a data frame _iris_ to the table _test_ in the TDengine server. Parameter _overwrite_ must be _false_. _append_ must be _TRUE_ and the schema of the data frame _iris_ should be the same as the table _test_.
- _dbGetQuery(conn, "select count(*) from test")_: run a query command
- _dbSendUpdate(conn, "use db")_: run any non-query command.
- _dbReadTable(conn, "test"_): read all the data in table _test_
- _dbDisconnect(conn)_: close a connection
- _dbRemoveTable(conn, "test")_: remove table _test_
Below functions are **not supported** currently:
- _dbExistsTable(conn, "test")_: if talbe _test_ exists
- _dbListTables(conn)_: list all tables in the connection
[Telegraf]: www.taosdata.com
[download link]: https://portal.influxdata.com/downloads
[Telegraf document]: www.taosdata.com
[Grafana]: https://grafana.com
[Grafana download page]: https://grafana.com/grafana/download
[Grafana official document]: https://grafana.com/docs/
此差异已折叠。
# TaosData Contributor License Agreement
This TaosData Contributor License Agreement (CLA) applies to any contribution you make to any TaosData projects. If you are representing your employing organization to sign this agreement, please warrant that you have the authority to grant the agreement.
## Terms
**"TaosData"**, **"we"**, **"our"** and **"us"** means TaosData, inc.
**"You"** and **"your"** means you or the organization you are on behalf of to sign this agreement.
**"Contribution"** means any original work you, or the organization you represent submit to TaosData for any project in any manner.
## Copyright License
All rights of your Contribution submitted to TaosData in any manner are granted to TaosData and recipients of software distributed by TaosData. You waive any rights that my affect our ownership of the copyright and grant to us a perpetual, worldwide, transferable, non-exclusive, no-charge, royalty-free, irrevocable, and sublicensable license to use, reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute Contributions and any derivative work created based on a Contribution.
## Patent License
With respect to any patents you own or that you can license without payment to any third party, you grant to us and to any recipient of software distributed by us, a perpetual, worldwide, transferable, non-exclusive, no-charge, royalty-free, irrevocable patent license to make, have make, use, sell, offer to sell, import, and otherwise transfer the Contribution in whole or in part, alone or included in any product under any patent you own, or license from a third party, that is necessarily infringed by the Contribution or by combination of the Contribution with any Work.
## Your Representations and Warranties
You represent and warrant that:
- the Contribution you submit is an original work that you can legally grant the rights set out in this agreement.
- the Contribution you submit and licenses you granted does not and will not, infringe the rights of any third party.
- you are not aware of any pending or threatened claims, suits, actions, or charges pertaining to the contributions. You also warrant to notify TaosData immediately if you become aware of any such actual or potential claims, suits, actions, allegations or charges.
## Support
You are not obligated to support your Contribution except you volunteer to provide support. If you want, you can provide for a fee.
**I agree and accept on behalf of myself and behalf of my organization:**
\ No newline at end of file
# 数据模型和设计
## 数据模型
### 物联网典型场景
在典型的物联网、车联网、运维监测场景中,往往有多种不同类型的数据采集设备,采集一个到多个不同的物理量。而同一种采集设备类型,往往又有多个具体的采集设备分布在不同的地点。大数据处理系统就是要将各种采集的数据汇总,然后进行计算和分析。对于同一类设备,其采集的数据类似如下的表格:
| Device ID | Time Stamp | Value 1 | Value 2 | Value 3 | Tag 1 | Tag 2 |
| :-------: | :-----------: | :-----: | :-----: | :-----: | :---: | :---: |
| D1001 | 1538548685000 | 10.3 | 219 | 0.31 | Red | Tesla |
| D1002 | 1538548684000 | 10.2 | 220 | 0.23 | Blue | BMW |
| D1003 | 1538548686500 | 11.5 | 221 | 0.35 | Black | Honda |
| D1004 | 1538548685500 | 13.4 | 223 | 0.29 | Red | Volvo |
| D1001 | 1538548695000 | 12.6 | 218 | 0.33 | Red | Tesla |
| D1004 | 1538548696600 | 11.8 | 221 | 0.28 | Black | Honda |
每一条记录都有设备ID,时间戳,采集的物理量,还有与每个设备相关的静态标签。每个设备是受外界的触发,或按照设定的周期采集数据。采集的数据点是时序的,是一个数据流。
### 数据特征
除时序特征外,仔细研究发现,物联网、车联网、运维监测类数据还具有很多其他明显的特征。
1. 数据是结构化的;
2. 数据极少有更新或删除操作;
3. 无需传统数据库的事务处理;
4. 相对互联网应用,写多读少;
5. 流量平稳,根据设备数量和采集频次,可以预测出来;
6. 用户关注的是一段时间的趋势,而不是某一特点时间点的值;
7. 数据是有保留期限的;
8. 数据的查询分析一定是基于时间段和地理区域的;
9. 除存储查询外,还往往需要各种统计和实时计算操作;
10. 数据量巨大,一天采集的数据就可以超过100亿条。
充分利用上述特征,TDengine采取了一特殊的优化的存储和计算设计来处理时序数据,能将系统处理能力显著提高。
### 关系型数据库模型
因为采集的数据一般是结构化数据,而且为降低学习门槛,TDengine采用传统的关系型数据库模型管理数据。因此用户需要先创建库,然后创建表,之后才能插入或查询数据。
### 一个设备一张表
为充分利用其数据的时序性和其他数据特点,TDengine要求**对每个数据采集点单独建表**(比如有一千万个智能电表,就需创建一千万张表,上述表格中的D1001, D1002, D1003, D1004都需单独建表),用来存储这个采集点所采集的时序数据。这种设计能保证一个采集点的数据在存储介质上是一块一块连续的,大幅减少随机读取操作,成数量级的提升读取和查询速度。而且由于不同数据采集设备产生数据的过程完全独立,每个设备只产生属于自己的数据,一张表也就只有一个写入者。这样每个表就可以采用无锁方式来写,写入速度就能大幅提升。同时,对于一个数据采集点而言,其产生的数据是时序的,因此写的操作可用追加的方式实现,进一步大幅提高数据写入速度。
### 数据建模最佳实践
**表(Table)**:TDengine 建议用数据采集点的名字(如上表中的D1001)来做表名。每个数据采集点可能同时采集多个物理量(如上表中的value1, value2, value3),每个物理量对应一张表中的一列,数据类型可以是整型、浮点型、字符串等。除此之外,表的第一列必须是时间戳,即数据类型为 timestamp。有的设备有多组采集量,每一组的采集频次是不一样的,这是需要对同一个设备建多张表。对采集的数据,TDengine将自动按照时间戳建立索引,但对采集的物理量不建任何索引。数据是用列式存储方式保存。
**超级表(Super Table)**:对于同一类型的采集点,为保证Schema的一致性,而且为便于聚合统计操作,可以先定义超级表STable(详见第10章),然后再定义表。每个采集点往往还有静态标签信息(如上表中的Tag 1, Tag 2),比如设备型号、颜色等,这些静态信息不会保存在存储采集数据的数据节点中,而是通过超级表保存在元数据节点中。这些静态标签信息将作为过滤条件,用于采集点之间的数据聚合统计操作。
**库(DataBase)**:不同的数据采集点往往具有不同的数据特征,包括数据采集频率高低,数据保留时间长短,备份数目,单个字段大小等等。为让各种场景下TDengine都能最大效率的工作,TDengine建议将不同数据特征的表创建在不同的库里。创建一个库时,除SQL标准的选项外,应用还可以指定保留时长、数据备份的份数、cache大小、文件块大小、是否压缩等多种参数(详见第19章)。
**Schemaless vs Schema**: 与NoSQL的各种引擎相比,由于应用需要定义schema,插入数据的灵活性降低。但对于物联网、金融这些典型的时序数据场景,schema会很少变更,因此这个灵活性不够的设计就不成问题。相反,TDengine采用结构化数据来进行处理的方式将让查询、分析的性能成数量级的提升。
TDengine对库的数量、超级表的数量以及表的数量没有做任何限制,而且其多少不会对性能产生影响,应用按照自己的场景创建即可。
## 主要模块
如图所示,TDengine服务主要包含两大模块:**管理节点模块(MGMT)****数据节点模块(DNODE)**。整个TDengine还包含**客户端模块**
<center> <img src="../assets/structure.png"> </center>
<center> 图 1 TDengine架构示意图 </center>
### 管理节点模块
管理节点模块主要负责元数据的存储和查询等工作,其中包括用户信息的管理、数据库和表信息的创建、删除以及查询等。应用连接TDengine时会首先连接到管理节点。在创建/删除数据库和表时,请求也会首先发送请求到管理节点模块。由管理节点模块首先创建/删除元数据信息,然后发送请求到数据节点模块进行分配/删除所需要的资源。在数据写入和查询时,应用同样会首先访问管理节点模块,获取元数据信息。然后根据元数据管理信息访问数据节点模块。
### 数据节点模块
写入数据的存储和查询工作是由数据节点模块负责。 为了更高效地利用资源,以及方便将来进行水平扩展,TDengine内部对数据节点进行了虚拟化,引入了虚拟节点(virtual node, 简称vnode)的概念,作为存储、资源分配以及数据备份的单元。如图2所示,在一个dnode上,通过虚拟化,可以将该dnode视为多个虚拟节点的集合。
创建一个库时,系统会自动分配vnode。每个vnode存储一定数量的表中的数据,但一个表只会存在于一个vnode里,不会跨vnode。一个vnode只会属于一个库,但一个库会有一到多个vnode。不同的vnode之间资源互不共享。每个虚拟节点都有自己的缓存,在硬盘上也有自己的存储目录。而同一vnode内部无论是缓存还是硬盘的存储都是共享的。通过虚拟化,TDengine可以将dnode上有限的物理资源合理地分配给不同的vnode,大大提高资源的利用率和并发度。一台物理机器上的虚拟节点个数可以根据其硬件资源进行配置。
<center> <img src="../assets/vnode.png"> </center>
<center> 图 2 TDengine虚拟化 </center>
### 客户端模块
TDengine客户端模块主要负责将应用传来的请求(SQL语句)进行解析,转化为内部结构体再发送到服务端。TDengine的各种接口都是基于TDengine的客户端模块进行开发的。客户端模块与管理模块使用TCP/UDP通讯,端口号由系统参数mgmtShellPort配置, 缺省值为6030。客户端与数据节点模块也是使用TCP/UDP通讯,端口号由系统参数vnodeShellPort配置, 缺省值为6035。两个端口号均可通过<a href="../administrator/#Configuration-on-Server">系统配置文件taos.cfg</a>进行个性化设置。
## 写入流程
TDengine的完整写入流程如图3所示。为了保证写入数据的安全性和完整性,TDengine在写入数据时采用[预写日志算法]。客户端发来的数据在经过验证以后,首先会写入预写日志中,以保证TDengine能够在断电等因素导致的服务重启时从预写日志中恢复数据,避免数据的丢失。写入预写日志后,数据会被写到对应的vnode的缓存中。随后,服务端会发送确认信息给客户端表示写入成功。TDengine中存在两种机制可以促使缓存中的数据写入到硬盘上进行持久化存储:
<center> <img src="../assets/write_process.png"> </center>
<center> 图 3 TDengine写入流程 </center>
1. **时间驱动的落盘**:TDengine服务会定时将vnode缓存中的数据写入到硬盘上,默认为一个小时落一次盘。落盘间隔可在配置文件taos.cfg中通过参数commitTime配置。
2. **数据驱动的落盘**:当vnode中缓存的数据达到一定规模时,为了不阻塞后续数据的写入,TDengine也会拉起落盘线程将缓存中的数据清空。数据驱动的落盘会刷新定时落盘的时间。
TDengine在数据落盘时会打开新的预写日志文件,在落盘后则会删除老的预写日志文件,避免日志文件无限制的增长。TDengine对缓存按照先进先出的原则进行管理,以保证每个表的最新数据都在缓存中。
## 数据存储
TDengine将所有数据存储在/var/lib/taos/目录下,您可以通过系统配置参数dataDir进行个性化配置。
TDengine中的元数据信息包括TDengine中的数据库、表、用户等信息。每个超级表、以及每个表的标签数据也存放在这里。为提高访问速度,元数据全部有缓存。
TDengine中写入的数据在硬盘上是按时间维度进行分片的。同一个vnode中的表在同一时间范围内的数据都存放在同一文件组中。这一数据分片方式可以大大简化数据在时间维度的查询,提高查询速度。在默认配置下,硬盘上的每个数据文件存放10天数据。用户可根据需要修改系统配置参数daysPerFile进行个性化配置。
表中的数据都有保存时间,一旦超过保存时间(缺省是3650天),数据将被系统自动删除。您可以通过系统配置参数daysToKeep进行个性化设置。
数据在文件中是按块存储的。每个数据块只包含一张表的数据,且数据是按照时间主键递增排列的。数据在数据块中按列存储,这样使得同列的数据存放在一起,对于不同的数据类型还采用不同的压缩方法,大大提高压缩的比例,节省存储空间。
数据文件总共有三类文件,一类是data文件,它存放了真实的数据块,该文件只进行追加操作;一类文件是head文件, 它存放了其对应的data文件中数据块的索引信息;第三类是last文件,专门存储最后写入的数据,每次落盘操作时,这部分数据会与内存里的数据合并,并决定是否写入data文件还是last文件。
\ No newline at end of file
# Data Model and Architecture
## Data Model
### A Typical IoT Scenario
In a typical IoT scenario, there are many types of devices. Each device is collecting one or multiple metrics. For a specific type of device, the collected data looks like the table below:
| Device ID | Time Stamp | Value 1 | Value 2 | Value 3 | Tag 1 | Tag 2 |
| :-------: | :-----------: | :-----: | :-----: | :-----: | :---: | :---: |
| D1001 | 1538548685000 | 10.3 | 219 | 0.31 | Red | Tesla |
| D1002 | 1538548684000 | 10.2 | 220 | 0.23 | Blue | BMW |
| D1003 | 1538548686500 | 11.5 | 221 | 0.35 | Black | Honda |
| D1004 | 1538548685500 | 13.4 | 223 | 0.29 | Red | Volvo |
| D1001 | 1538548695000 | 12.6 | 218 | 0.33 | Red | Tesla |
| D1004 | 1538548696600 | 11.8 | 221 | 0.28 | Black | Honda |
Each data record has device ID, timestamp, the collected metrics, and static tags associated with the device. Each device generates a data record in a pre-defined timer or triggered by an event. It is a sequence of data points, like a stream.
### Data Characteristics
Being a series of data points over time, data points generated by devices, sensors, servers, or applications have strong common characteristics.
1. metric is always structured data;
2. there are rarely delete/update operations on collected data;
3. there is only one single data source for one device or sensor;
4. ratio of read/write is much lower than typical Internet application;
5. the user pays attention to the trend of data, not the specific value at a specific time;
6. there is always a data retention policy;
7. the data query is always executed in a given time range and a subset of devices;
8. real-time aggregation or analytics is mandatory;
9. traffic is predictable based on the number of devices and sampling frequency;
10. data volume is huge, a system may generate 10 billion data points in a day.
By utilizing the above characteristics, TDengine designs the storage and computing engine in a special and optimized way for time-series data. The system efficiency is improved significantly.
### Relational Database Model
Since time-series data is more likely to be structured data, TDengine adopts the traditional relational database model to process them. You need to create a database, create tables with schema definition, then insert data points and execute queries to explore the data. Standard SQL is used, there is no learning curve.
### One Table for One Device
Due to different network latency, the data points from different devices may arrive at the server out of order. But for the same device, data points will arrive at the server in order if system is designed well. To utilize this special feature, TDengine requires the user to create a table for each device (time-stream). For example, if there are over 10,000 smart meters, 10,000 tables shall be created. For the table above, 4 tables shall be created for device D1001, D1002, D1003 and D1004, to store the data collected.
This strong requirement can guarantee the data points from a device can be saved in a continuous memory/hard disk space block by block. If queries are applied only on one device in a time range, this design will reduce the read latency significantly since a whole block is owned by one single device. Also, write latency can be significantly reduced too, since the data points generated by the same device will arrive in order, the new data point will be simply appended to a block. Cache block size and the rows of records in a file block can be configured to fit the scenarios.
### Best Practices
**Table**: TDengine suggests to use device ID as the table name (like D1001 in the above diagram). Each device may collect one or more metrics (like value1, valu2, valu3 in the diagram). Each metric has a column in the table, the metric name can be used as the column name. The data type for a column can be int, float, double, tinyint, bigint, bool or binary. Sometimes, a device may have multiple metric group, each group have different sampling period, you shall create a table for each group for each device. The first column in the table must be time stamp. TDengine uses time stamp as the index, and won’t build the index on any metrics stored.
**Tags:** to support aggregation over multiple tables efficiently, [STable(Super Table)](../super-table) concept is introduced by TDengine. A STable is used to represent the same type of device. The schema is used to define the collected metrics(like value1, value2, value3 in the diagram), and tags are used to define the static attributes for each table or device(like tag1, tag2 in the diagram). A table is created via STable with a specific tag value. All or a subset of tables in a STable can be aggregated by filtering tag values.
**Database:** different types of devices may generate data points in different patterns and shall be processed differently. For example, sampling frequency, data retention policy, replication number, cache size, record size, the compression algorithm may be different. To make the system more efficient, TDengine suggests creating a different database with unique configurations for different scenarios
**Schemaless vs Schema:** compared with NoSQL database, since a table with schema definition shall be created before the data points can be inserted, flexibilities are not that good, especially when the schema is changed. But in most IoT scenarios, the schema is well defined and is rarely changed, the loss of flexibilities won’t be a big pain to developers or the administrator. TDengine allows the application to change the schema in a second even there is a huge amount of historical data when schema has to be changed.
TDengine does not impose a limitation on the number of tables, [STables](../super-table), or databases. You can create any number of STable or databases to fit the scenarios.
## Architecture
There are two main modules in TDengine server as shown in Picture 1: **Management Module (MGMT)** and **Data Module(DNODE)**. The whole TDengine architecture also includes a **TDengine Client Module**.
<center> <img src="../assets/structure.png"> </center>
<center> Picture 1 TDengine Architecture </center>
### MGMT Module
The MGMT module deals with the storage and querying on metadata, which includes information about users, databases, and tables. Applications will connect to the MGMT module at first when connecting the TDengine server. When creating/dropping databases/tables, The request is sent to the MGMT module at first to create/delete metadata. Then the MGMT module will send requests to the data module to allocate/free resources required. In the case of writing or querying, applications still need to visit MGMT module to get meta data, according to which, then access the DNODE module.
### DNODE Module
The DNODE module is responsible for storing and querying data. For the sake of future scaling and high-efficient resource usage, TDengine applies virtualization on resources it uses. TDengine introduces the concept of virtual node (vnode), which is the unit of storage, resource allocation and data replication (enterprise edition). As is shown in Picture 2, TDengine treats each data node as an aggregation of vnodes.
When a DB is created, the system will allocate a vnode. Each vnode contains multiple tables, but a table belongs to only one vnode. Each DB has one or mode vnodes, but one vnode belongs to only one DB. Each vnode contains all the data in a set of tables. Vnodes have their own cache, directory to store data. Resources between different vnodes are exclusive with each other, no matter cache or file directory. However, resources in the same vnode are shared between all the tables in it. By virtualization, TDengine can distribute resources reasonably to each vnode and improve resource usage and concurrency. The number of vnodes on a dnode is configurable according to its hardware resources.
<center> <img src="../assets/vnode.png"> </center>
<center> Picture 2 TDengine Virtualization </center>
### Client Module
TDengine client module accepts requests (mainly in SQL form) from applications and converts the requests to internal representations and sends to the server side. TDengine supports multiple interfaces, which are all built on top of TDengine client module.
For the communication between client and MGMT module, TCP/UDP is used, the port is set by the parameter mgmtShellPort in system configuration file taos.cfg, default is 6030. For the communication between client and DNODE module, TCP/UDP is used, the port is set by the parameter vnodeShellPort in the system configuration file, default is 6035.
## Writing Process
Picture 3 shows the full writing process of TDengine. TDengine uses [Writing Ahead Log] (WAL) strategy to assure data security and integrity. Data received from the client is written to the commit log at first. When TDengine recovers from crashes caused by power lose or other situations, the commit log is used to recover data. After writting to commit log, data will be wrtten to the corresponding vnode cache, then an acknowledgment is sent to the application. There are two mechanisms that can flush data in cache to disk for persistent storage:
1. **Flush driven by timer**: There is a backend timer which flushes data in cache periodically to disks. The period is configurable via parameter commitTime in system configuration file taos.cfg.
2. **Flush driven by data**: Data in the cache is also flushed to disks when the left buffer size is below a threshold. Flush driven by data can reset the timer of flush driven by the timer.
<center> <img src="../assets/write_process.png"> </center>
<center> Picture 3 TDengine Writting Process </center>
New commit log file will be opened when the committing process begins. When the committing process finishes, the old commit file will be removed.
## Data Storage
TDengine data are saved in _/var/lib/taos_ directory by default. It can be changed to other directories by setting the parameter dataDir in system configuration file taos.cfg.
TDengine's metadata includes the database, table, user, super table and tag information. To reduce the latency, metadata are all buffered in the cache.
Data records saved in tables are sharded according to the time range. Data of tables in the same vnode in a certain time range are saved in the same file group. This sharding strategy can effectively improve data searching speed. By default, one group of files contain data in 10 days, which can be configured by *daysPerFile* in the configuration file or by *DAYS* keyword in *CREATE DATABASE* clause.
Data records are removed automatically once their lifetime is passed. The lifetime is configurable via parameter daysToKeep in the system configuration file. The default value is 3650 days.
Data in files are blockwise. A data block only contains one table's data. Records in the same data block are sorted according to the primary timestamp. To improve the compression ratio, records are stored column by column, and the different compression algorithm is applied based on each column's data type.
\ No newline at end of file
# TDengine System Architecture
## Storage Design
TDengine data mainly include **metadata** and **data** that we will introduce in the following sections.
### Metadata Storage
Metadata include the information of databases, tables, etc. Metadata files are saved in _/var/lib/taos/mgmt/_ directory by default. The directory tree is as below:
```
/var/lib/taos/
+--mgmt/
+--db.db
+--meters.db
+--user.db
+--vgroups.db
```
A metadata structure (database, table, etc.) is saved as a record in a metadata file. All metadata files are appended only, and even a drop operation adds a deletion record at the end of the file.
### Data storage
Data in TDengine are sharded according to the time range. Data of tables in the same vnode in a certain time range are saved in the same filegroup, such as files v0f1804*. This sharding strategy can effectively improve data searching speed. By default, a group of files contains data in 10 days, which can be configured by *daysPerFile* in the configuration file or by *DAYS* keyword in *CREATE DATABASE* clause. Data in files are blockwised. A data block only contains one table's data. Records in the same data block are sorted according to the primary timestamp, which helps to improve the compression rate and save storage. The compression algorithms used in TDengine include simple8B, delta-of-delta, RLE, LZ4, etc.
By default, TDengine data are saved in */var/lib/taos/data/* directory. _/var/lib/taos/tsdb/_ directory contains vnode informations and data file linkes.
```
/var/lib/taos/
+--tsdb/
| +--vnode0
| +--meterObj.v0
| +--db/
| +--v0f1804.head->/var/lib/taos/data/vnode0/v0f1804.head1
| +--v0f1804.data->/var/lib/taos/data/vnode0/v0f1804.data
| +--v0f1804.last->/var/lib/taos/data/vnode0/v0f1804.last1
| +--v0f1805.head->/var/lib/taos/data/vnode0/v0f1805.head1
| +--v0f1805.data->/var/lib/taos/data/vnode0/v0f1805.data
| +--v0f1805.last->/var/lib/taos/data/vnode0/v0f1805.last1
| :
+--data/
+--vnode0/
+--v0f1804.head1
+--v0f1804.data
+--v0f1804.last1
+--v0f1805.head1
+--v0f1805.data
+--v0f1805.last1
:
```
#### meterObj file
There are only one meterObj file in a vnode. Informations bout the vnode, such as created time, configuration information, vnode statistic informations are saved in this file. It has the structure like below:
```
<start_of_file>
[file_header]
[table_record1_offset&length]
[table_record2_offset&length]
...
[table_recordN_offset&length]
[table_record1]
[table_record2]
...
[table_recordN]
<end_of_file>
```
The file header takes 512 bytes, which mainly contains informations about the vnode. Each table record is the representation of a table on disk.
#### head file
The _head_ files contain the index of data blocks in the _data_ file. The inner organization is as below:
```
<start_of_file>
[file_header]
[table1_offset]
[table2_offset]
...
[tableN_offset]
[table1_index_block]
[table2_index_block]
...
[tableN_index_block]
<end_of_file>
```
The table offset array in the _head_ file saves the information about the offsets of each table index block. Indices on data blocks in the same table are saved continuously. This also makes it efficient to load data indices on the same table. The data index block has a structure like:
```
[index_block_info]
[block1_index]
[block2_index]
...
[blockN_index]
```
The index block info part contains the information about the index block such as the number of index blocks, etc. Each block index corresponds to a real data block in the _data_ file or _last_ file. Information about the location of the real data block, the primary timestamp range of the data block, etc. are all saved in the block index part. The block indices are sorted in ascending order according to the primary timestamp. So we can apply algorithms such as the binary search on the data to efficiently search blocks according to time.
#### data file
The _data_ files store the real data block. They are append-only. The organization is as:
```
<start_of_file>
[file_header]
[block1]
[block2]
...
[blockN]
<end_of_file>
```
A data block in _data_ files only belongs to a table in the vnode and the records in a data block are sorted in ascending order according to the primary timestamp key. Data blocks are column-oriented. Data in the same column are stored contiguously, which improves reading speed and compression rate because of their similarity. A data block has the following organization:
```
[column1_info]
[column2_info]
...
[columnN_info]
[column1_data]
[column2_data]
...
[columnN_data]
```
The column info part includes information about column types, column compression algorithm, column data offset and length in the _data_ file, etc. Besides, pre-calculated results of the column data in the block are also in the column info part, which helps to improve reading speed by avoiding loading data block necessarily.
#### last file
To avoid storage fragment and to import query speed and compression rate, TDengine introduces an extra file, the _last_ file. When the number of records in a data block is lower than a threshold, TDengine will flush the block to the _last_ file for temporary storage. When new data comes, the data in the _last_ file will be merged with the new data and form a larger data block and written to the _data_ file. The organization of the _last_ file is similar to the _data_ file.
### Summary
The innovation in architecture and storage design of TDengine improves resource usage. On the one hand, the virtualization makes it easy to distribute resources between different vnodes and for future scaling. On the other hand, sorted and column-oriented storage makes TDengine have a great advantage in writing, querying and compression.
## Query Design
#### Introduction
TDengine provides a variety of query functions for both tables and super tables. In addition to regular aggregate queries, it also provides time window based query and statistical aggregation for time series data. TDengine's query processing requires the client app, management node, and data node to work together. The functions and modules involved in query processing included in each component are as follows:
Client (Client App). The client development kit, embed in a client application, consists of TAOS SQL parser and query executor, the second-stage aggregator (Result Merger), continuous query manager and other major functional modules. The SQL parser is responsible for parsing and verifying the SQL statement and converting it into an abstract syntax tree. The query executor is responsible for transforming the abstract syntax tree into the query execution logic and creates the metadata query according to the query condition of the SQL statement. Since TAOS SQL does not currently include complex nested queries and pipeline query processing mechanism, there is no longer need for query plan optimization and physical query plan conversions. The second-stage aggregator is responsible for performing the aggregation of the independent results returned by query involved data nodes at the client side to generate final results. The continuous query manager is dedicated to managing the continuous queries created by users, including issuing fixed-interval query requests and writing the results back to TDengine or returning to the client application as needed. Also, the client is also responsible for retrying after the query fails, canceling the query request, and maintaining the connection heartbeat and reporting the query status to the management node.
Management Node. The management node keeps the metadata of all the data of the entire cluster system, provides the metadata of the data required for the query from the client node, and divides the query request according to the load condition of the cluster. The super table contains information about all the tables created according to the super table, so the query processor (Query Executor) of the management node is responsible for the query processing of the tags of tables and returns the table information satisfying the tag query. Besides, the management node maintains the query status of the cluster in the Query Status Manager component, in which the metadata of all queries that are currently executing are temporarily stored in-memory buffer. When the client issues *show queries* command to management node, current running queries information is returned to the client.
Data Node. The data node, responsible for storing all data of the database, consists of query executor, query processing scheduler, query task queue, and other related components. Once the query requests from the client received, they are put into query task queue and waiting to be processed by query executor. The query executor extracts the query request from the query task queue and invokes the query optimizer to perform the basic optimization for the query execution plan. And then query executor scans the qualified data blocks in both cache and disk to obtain qualified data and return the calculated results. Besides, the data node also needs to respond to management information and commands from the management node. For example, after the *kill query* received from the management node, the query task needs to be stopped immediately.
<center> <img src="../assets/fig1.png"> </center>
<center>Fig 1. System query processing architecture diagram (only query related components)</center>
#### Query Process Design
The client, the management node, and the data node cooperate to complete the entire query processing of TDengine. Let's take a concrete SQL query as an example to illustrate the whole query processing flow. The SQL statement is to query on super table *FOO_SUPER_TABLE* to get the total number of records generated on January 12, 2019, from the table, of which TAG_LOC equals to 'beijing'. The SQL statement is as follows:
```sql
SELECT COUNT(*)
FROM FOO_SUPER_TABLE
WHERE TAG_LOC = 'beijing' AND TS >= '2019-01-12 00:00:00' AND TS < '2019-01-13 00:00:00'
```
First, the client invokes the TAOS SQL parser to parse and validate the SQL statement, then generates a syntax tree, and extracts the object of the query - the super table *FOO_SUPER_TABLE*, and then the parser sends requests with filtering information (TAG_LOC='beijing') to management node to get the corresponding metadata about *FOO_SUPER_TABLE*.
Once the management node receives the request for metadata acquisition, first finds the super table *FOO_SUPER_TABLE* basic information, and then applies the query condition (TAG_LOC='beijing') to filter all the related tables created according to it. And finally, the query executor returns the metadata information that satisfies the query request to the client.
After the client obtains the metadata information of *FOO_SUPER_TABLE*, the query executor initiates a query request with timestamp range filtering condition (TS >= '2019- 01-12 00:00:00' AND TS < '2019-01-13 00:00:00') to all nodes that hold the corresponding data according to the information about data distribution in metadata.
The data node receives the query sent from the client, converts it into an internal structure and puts it into the query task queue to be executed by query executor after optimizing the execution plan. When the query result is obtained, the query result is returned to the client. It should be noted that the data nodes perform the query process independently of each other, and rely solely on their data and content for processing.
When all data nodes involved in the query return results, the client aggregates the result sets from each data node. In this case, all results are accumulated to generate the final query result. The second stage of aggregation is not always required for all queries. For example, a column selection query does not require a second-stage aggregation at all.
#### REST Query Process
In addition to C/C++, Python, and JDBC interface, TDengine also provides a REST interface based on the HTTP protocol, which is different from using the client application programming interface. When the user uses the REST interface, all the query processing is completed on the server-side, and the user's application is not involved in query processing anymore. After the query processing is completed, the result is returned to the client through the HTTP JSON string.
<center> <img src="../assets/fig2.png"> </center>
<center>Fig. 2 REST query architecture</center>
When a client uses an HTTP-based REST query interface, the client first establishes a connection with the HTTP connector at the data node and then uses the token to ensure the reliability of the request through the REST signature mechanism. For the data node, after receiving the request, the HTTP connector invokes the embedded client program to initiate a query processing, and then the embedded client parses the SQL statement from the HTTP connector and requests the management node to get metadata as needed. After that, the embedded client sends query requests to the same data node or other nodes in the cluster and aggregates the calculation results on demand. Finally, you also need to convert the result of the query into a JSON format string and return it to the client via an HTTP response. After the HTTP connector receives the request SQL, the subsequent process processing is completely consistent with the query processing using the client application development kit.
It should be noted that during the entire processing, the client application is no longer involved in, and is only responsible for sending SQL requests through the HTTP protocol and receiving the results in JSON format. Besides, each data node is embedded with an HTTP connector and a client, so any data node in the cluster received requests from a client, the data node can initiate the query and return the result to the client through the HTTP protocol, with transfer the request to other data nodes.
#### Technology
Because TDengine stores data and tags value separately, the tag value is kept in the management node and directly associated with each table instead of records, resulting in a great reduction of the data storage. Therefore, the tag value can be managed by a fully in-memory structure. First, the filtering of the tag data can drastically reduce the data size involved in the second phase of the query. The query processing for the data is performed at the data node. TDengine takes advantage of the immutable characteristics of IoT data by calculating the maximum, minimum, and other statistics of the data in one data block on each saved data block, to effectively improve the performance of query processing. If the query process involves all the data of the entire data block, the pre-computed result is used directly, and the content of the data block is no longer needed. Since the size of disk space required to store the pre-computation result is much smaller than the size of the specific data, the pre-computation result can greatly reduce the disk IO and speed up the query processing.
TDengine employs column-oriented data storage techniques. When the data block is involved to be loaded from the disk for calculation, only the required column is read according to the query condition, and the read overhead can be minimized. The data of one column is stored in a contiguous memory block and therefore can make full use of the CPU L2 cache to greatly speed up the data scanning. Besides, TDengine utilizes the eagerly responding mechanism and returns a partial result before the complete result is acquired. For example, when the first batch of results is obtained, the data node immediately returns it directly to the client in case of a column select query.
\ No newline at end of file
# 超级表STable:多表聚合
TDengine要求每个数据采集点单独建表,这样能极大提高数据的插入/查询性能,但是导致系统中表的数量猛增,让应用对表的维护以及聚合、统计操作难度加大。为降低应用的开发难度,TDengine引入了超级表STable (Super Table)的概念。
## 什么是超级表
STable是同一类型数据采集点的抽象,是同类型采集实例的集合,包含多张数据结构一样的子表。每个STable为其子表定义了表结构和一组标签:表结构即表中记录的数据列及其数据类型;标签名和数据类型由STable定义,标签值记录着每个子表的静态信息,用以对子表进行分组过滤。子表本质上就是普通的表,由一个时间戳主键和若干个数据列组成,每行记录着具体的数据,数据查询操作与普通表完全相同;但子表与普通表的区别在于每个子表从属于一张超级表,并带有一组由STable定义的标签值。每种类型的采集设备可以定义一个STable。数据模型定义表的每列数据的类型,如温度、压力、电压、电流、GPS实时位置等,而标签信息属于Meta Data,如采集设备的序列号、型号、位置等,是静态的,是表的元数据。用户在创建表(数据采集点)时指定STable(采集类型)外,还可以指定标签的值,也可事后增加或修改。
TDengine扩展标准SQL语法用于定义STable,使用关键词tags指定标签信息。语法如下:
```mysql
CREATE TABLE <stable_name> (<field_name> TIMESTAMP, field_name1 field_type,…) TAGS(tag_name tag_type, …)
```
其中tag_name是标签名,tag_type是标签的数据类型。标签可以使用时间戳之外的其他TDengine支持的数据类型,标签的个数最多为6个,名字不能与系统关键词相同,也不能与其他列名相同。如:
```mysql
create table thermometer (ts timestamp, degree float)
tags (location binary(20), type int)
```
上述SQL创建了一个名为thermometer的STable,带有标签location和标签type。
为某个采集点创建表时,可以指定其所属的STable以及标签的值,语法如下:
```mysql
CREATE TABLE <tb_name> USING <stb_name> TAGS (tag_value1,...)
```
沿用上面温度计的例子,使用超级表thermometer建立单个温度计数据表的语句如下:
```mysql
create table t1 using thermometer tags (‘beijing’, 10)
```
上述SQL以thermometer为模板,创建了名为t1的表,这张表的Schema就是thermometer的Schema,但标签location值为‘beijing’,标签type值为10。
用户可以使用一个STable创建数量无上限的具有不同标签的表,从这个意义上理解,STable就是若干具有相同数据模型,不同标签的表的集合。与普通表一样,用户可以创建、删除、查看超级表STable,大部分适用于普通表的查询操作都可运用到STable上,包括各种聚合和投影选择函数。除此之外,可以设置标签的过滤条件,仅对STbale中部分表进行聚合查询,大大简化应用的开发。
TDengine对表的主键(时间戳)建立索引,暂时不提供针对数据模型中其他采集量(比如温度、压力值)的索引。每个数据采集点会采集若干数据记录,但每个采集点的标签仅仅是一条记录,因此数据标签在存储上没有冗余,且整体数据规模有限。TDengine将标签数据与采集的动态数据完全分离存储,而且针对STable的标签建立了高性能内存索引结构,为标签提供全方位的快速操作支持。用户可按照需求对其进行增删改查(Create,Retrieve,Update,Delete,CRUD)操作。
STable从属于库,一个STable只属于一个库,但一个库可以有一到多个STable, 一个STable可有多个子表。
## 超级表管理
- 创建超级表
```mysql
CREATE TABLE <stable_name> (<field_name> TIMESTAMP, field_name1 field_type,…) TAGS(tag_name tag_type, …)
```
与创建表的SQL语法相似。但需指定TAGS字段的名称和类型。
说明:
1. TAGS列总长度不能超过512 bytes;
2. TAGS列的数据类型不能是timestamp和nchar类型;
3. TAGS列名不能与其他列名相同;
4. TAGS列名不能为预留关键字.
- 显示已创建的超级表
```mysql
show stables;
```
查看数据库内全部STable,及其相关信息,包括STable的名称、创建时间、列数量、标签(TAG)数量、通过该STable建表的数量。
- 删除超级表
```mysql
DROP TABLE <stable_name>
```
Note: 删除STable不会级联删除通过STable创建的表;相反删除STable时要求通过该STable创建的表都已经被删除。
- 查看属于某STable并满足查询条件的表
```mysql
SELECT TBNAME,[TAG_NAME,…] FROM <stable_name> WHERE <tag_name> <[=|=<|>=|<>] values..> ([AND|OR] …)
```
查看属于某STable并满足查询条件的表。说明:TBNAME为关键词,显示通过STable建立的子表表名,查询过程中可以使用针对标签的条件。
```mysql
SELECT COUNT(TBNAME) FROM <stable_name> WHERE <tag_name> <[=|=<|>=|<>] values..> ([AND|OR] …)
```
统计属于某个STable并满足查询条件的子表的数量
## 写数据时自动建子表
在某些特殊场景中,用户在写数据时并不确定某个设备的表是否存在,此时可使用自动建表语法来实现写入数据时里用超级表定义的表结构自动创建不存在的子表,若该表已存在则不会建立新表。注意:自动建表语句只能自动建立子表而不能建立超级表,这就要求超级表已经被事先定义好。自动建表语法跟insert/import语法非常相似,唯一区别是语句中增加了超级表和标签信息。具体语法如下:
```mysql
INSERT INTO <tb_name> USING <stb_name> TAGS (<tag1_value>, ...) VALUES (field_value, ...) (field_value, ...) ...;
```
向表tb_name中插入一条或多条记录,如果tb_name这张表不存在,则会用超级表stb_name定义的表结构以及用户指定的标签值(即tag1_value…)来创建名为tb_name新表,并将用户指定的值写入表中。如果tb_name已经存在,则建表过程会被忽略,系统也不会检查tb_name的标签是否与用户指定的标签值一致,也即不会更新已存在表的标签。
```mysql
INSERT INTO <tb1_name> USING <stb1_name> TAGS (<tag1_value1>, ...) VALUES (<field1_value1>, ...) (<field1_value2>, ...) ... <tb_name2> USING <stb_name2> TAGS(<tag1_value2>, ...) VALUES (<field1_value1>, ...) ...;
```
向多张表tb1_name,tb2_name等插入一条或多条记录,并分别指定各自的超级表进行自动建表。
## STable中TAG管理
除了更新标签的值的操作是针对子表进行,其他所有的标签操作(添加标签、删除标签等)均只能作用于STable,不能对单个子表操作。对STable添加标签以后,依托于该STable建立的所有表将自动增加了一个标签,对于数值型的标签,新增加的标签的默认值是0.
- 添加新的标签
```mysql
ALTER TABLE <stable_name> ADD TAG <new_tag_name> <TYPE>
```
为STable增加一个新的标签,并指定新标签的类型。标签总数不能超过6个。
- 删除标签
```mysql
ALTER TABLE <stable_name> DROP TAG <tag_name>
```
删除超级表的一个标签,从超级表删除某个标签后,该超级表下的所有子表也会自动删除该标签。
说明:第一列标签不能删除,至少需要为STable保留一个标签。
- 修改标签名
```mysql
ALTER TABLE <stable_name> CHANGE TAG <old_tag_name> <new_tag_name>
```
修改超级表的标签名,从超级表修改某个标签名后,该超级表下的所有子表也会自动更新该标签名。
- 修改子表的标签值
```mysql
ALTER TABLE <table_name> SET TAG <tag_name>=<new_tag_value>
```
## STable多表聚合
针对所有的通过STable创建的子表进行多表聚合查询,支持按照全部的TAG值进行条件过滤,并可将结果按照TAGS中的值进行聚合,暂不支持针对binary类型的模糊匹配过滤。语法如下:
```mysql
SELECT function<field_name>,…
FROM <stable_name>
WHERE <tag_name> <[=|<=|>=|<>] values..> ([AND|OR] …)
INTERVAL (<time range>)
GROUP BY <tag_name>, <tag_name>…
ORDER BY <tag_name> <asc|desc>
SLIMIT <group_limit>
SOFFSET <group_offset>
LIMIT <record_limit>
OFFSET <record_offset>
```
**说明**
超级表聚合查询,TDengine目前支持以下聚合\选择函数:sum、count、avg、first、last、min、max、top、bottom,以及针对全部或部分列的投影操作,使用方式与单表查询的计算过程相同。暂不支持其他类型的聚合计算和四则运算。当前所有的函数及计算过程均不支持嵌套的方式进行执行。
不使用GROUP BY的查询将会对超级表下所有满足筛选条件的表按时间进行聚合,结果输出默认是按照时间戳单调递增输出,用户可以使用ORDER BY _c0 ASC|DESC选择查询结果时间戳的升降排序;使用GROUP BY <tag_name> 的聚合查询会按照tags进行分组,并对每个组内的数据分别进行聚合,输出结果为各个组的聚合结果,组间的排序可以由ORDER BY <tag_name> 语句指定,每个分组内部,时间序列是单调递增的。
使用SLIMIT/SOFFSET语句指定组间分页,即指定结果集中输出的最大组数以及对组起始的位置。使用LIMIT/OFFSET语句指定组内分页,即指定结果集中每个组内最多输出多少条记录以及记录起始的位置。
## STable使用示例
以温度传感器采集时序数据作为例,示范STable的使用。 在这个例子中,对每个温度计都会建立一张表,表名为温度计的ID,温度计读数的时刻记为ts,采集的值记为degree。通过tags给每个采集器打上不同的标签,其中记录温度计的地区和类型,以方便我们后面的查询。所有温度计的采集量都一样,因此我们用STable来定义表结构。
###定义STable表结构并使用它创建子表
创建STable语句如下:
```mysql
CREATE TABLE thermometer (ts timestamp, degree double)
TAGS(location binary(20), type int)
```
假设有北京,天津和上海三个地区的采集器共4个,温度采集器有3种类型,我们就可以对每个采集器建表如下:
```mysql
CREATE TABLE therm1 USING thermometer TAGS (’beijing’, 1);
CREATE TABLE therm2 USING thermometer TAGS (’beijing’, 2);
CREATE TABLE therm3 USING thermometer TAGS (’tianjin’, 1);
CREATE TABLE therm4 USING thermometer TAGS (’shanghai’, 3);
```
其中therm1,therm2,therm3,therm4是超级表thermometer四个具体的子表,也即普通的Table。以therm1为例,它表示采集器therm1的数据,表结构完全由thermometer定义,标签location=”beijing”, type=1表示therm1的地区是北京,类型是第1类的温度计。
###写入数据
注意,写入数据时不能直接对STable操作,而是要对每张子表进行操作。我们分别向四张表therm1,therm2, therm3, therm4写入一条数据,写入语句如下:
```mysql
INSERT INTO therm1 VALUES (’2018-01-01 00:00:00.000’, 20);
INSERT INTO therm2 VALUES (’2018-01-01 00:00:00.000’, 21);
INSERT INTO therm3 VALUES (’2018-01-01 00:00:00.000’, 24);
INSERT INTO therm4 VALUES (’2018-01-01 00:00:00.000’, 23);
```
###按标签聚合查询
查询位于北京(beijing)和天津(tianjing)两个地区的温度传感器采样值的数量count(*)、平均温度avg(degree)、最高温度max(degree)、最低温度min(degree),并将结果按所处地域(location)和传感器类型(type)进行聚合。
```mysql
SELECT COUNT(*), AVG(degree), MAX(degree), MIN(degree)
FROM thermometer
WHERE location=’beijing’ or location=’tianjing’
GROUP BY location, type
```
###按时间周期聚合查询
查询仅位于北京以外地区的温度传感器最近24小时(24h)采样值的数量count(*)、平均温度avg(degree)、最高温度max(degree)和最低温度min(degree),将采集结果按照10分钟为周期进行聚合,并将结果按所处地域(location)和传感器类型(type)再次进行聚合。
```mysql
SELECT COUNT(*), AVG(degree), MAX(degree), MIN(degree)
FROM thermometer
WHERE name<>’beijing’ and ts>=now-1d
INTERVAL(10M)
GROUP BY location, type
```
\ No newline at end of file
# STable: Super Table
"One Table for One Device" design can improve the insert/query performance significantly for a single device. But it has a side effect, the aggregation of multiple tables becomes hard. To reduce the complexity and improve the efficiency, TDengine introduced a new concept: STable (Super Table).
## What is a Super Table
STable is an abstract and a template for a type of device. A STable contains a set of devices (tables) that have the same schema or data structure. Besides the shared schema, a STable has a set of tags, like the model, serial number and so on. Tags are used to record the static attributes for the devices and are used to group a set of devices (tables) for aggregation. Tags are metadata of a table and can be added, deleted or changed.
TDengine does not save tags as a part of the data points collected. Instead, tags are saved as metadata. Each table has a set of tags. To improve query performance, tags are all cached and indexed. One table can only belong to one STable, but one STable may contain many tables.
Like a table, you can create, show, delete and describe STables. Most query operations on tables can be applied to STable too, including the aggregation and selector functions. For queries on a STable, if no tags filter, the operations are applied to all the tables created via this STable. If there is a tag filter, the operations are applied only to a subset of the tables which satisfy the tag filter conditions. It will be very convenient to use tags to put devices into different groups for aggregation.
##Create a STable
Similiar to creating a standard table, syntax is:
```mysql
CREATE TABLE <stable_name> (<field_name> TIMESTAMP, field_name1 field_type,…) TAGS(tag_name tag_type, …)
```
New keyword "tags" is introduced, where tag_name is the tag name, and tag_type is the associated data type.
Note:
1. The bytes of all tags together shall be less than 512
2. Tag's data type can not be time stamp or nchar
3. Tag name shall be different from the field name
4. Tag name shall not be the same as system keywords
5. Maximum number of tags is 6
For example:
```mysql
create table thermometer (ts timestamp, degree float)
tags (location binary(20), type int)
```
The above statement creates a STable thermometer with two tag "location" and "type"
##Create a Table via STable
To create a table for a device, you can use a STable as its template and assign the tag values. The syntax is:
```mysql
CREATE TABLE <tb_name> USING <stb_name> TAGS (tag_value1,...)
```
You can create any number of tables via a STable, and each table may have different tag values. For example, you create five tables via STable thermometer below:
```mysql
create table t1 using thermometer tags (‘beijing’, 10);
create table t2 using thermometer tags (‘beijing’, 20);
create table t3 using thermometer tags (‘shanghai’, 10);
create table t4 using thermometer tags (‘shanghai’, 20);
create table t5 using thermometer tags (‘new york’, 10);
```
## Aggregate Tables via STable
You can group a set of tables together by specifying the tags filter condition, then apply the aggregation operations. The result set can be grouped and ordered based on tag value. Syntax is:
```mysql
SELECT function<field_name>,…
FROM <stable_name>
WHERE <tag_name> <[=|<=|>=|<>] values..> ([AND|OR] …)
INTERVAL (<time range>)
GROUP BY <tag_name>, <tag_name>…
ORDER BY <tag_name> <asc|desc>
SLIMIT <group_limit>
SOFFSET <group_offset>
LIMIT <record_limit>
OFFSET <record_offset>
```
For the time being, STable supports only the following aggregation/selection functions: *sum, count, avg, first, last, min, max, top, bottom*, and the projection operations, the same syntax as a standard table. Arithmetic operations are not supported, embedded queries not either.
*INTERVAL* is used for the aggregation over a time range.
If *GROUP BY* is not used, the aggregation is applied to all the selected tables, and the result set is output in ascending order of the timestamp, but you can use "*ORDER BY _c0 ASC|DESC*" to specify the order you like.
If *GROUP BY <tag_name>* is used, the aggregation is applied to groups based on tags. Each group is aggregated independently. Result set is a group of aggregation results. The group order is decided by *ORDER BY <tag_name>*. Inside each group, the result set is in the ascending order of the time stamp.
*SLIMIT/SOFFSET* are used to limit the number of groups and starting group number.
*LIMIT/OFFSET* are used to limit the number of records in a group and the starting rows.
###Example 1:
Check the average, maximum, and minimum temperatures of Beijing and Shanghai, and group the result set by location and type. The SQL statement shall be:
```mysql
SELECT COUNT(*), AVG(degree), MAX(degree), MIN(degree)
FROM thermometer
WHERE location=’beijing’ or location=’tianjing’
GROUP BY location, type
```
### Example 2:
List the number of records, average, maximum, and minimum temperature every 10 minutes for the past 24 hours for all the thermometers located in Beijing with type 10. The SQL statement shall be:
```mysql
SELECT COUNT(*), AVG(degree), MAX(degree), MIN(degree)
FROM thermometer
WHERE name=’beijing’ and type=10 and ts>=now-1d
INTERVAL(10M)
```
## Create Table Automatically
Insert operation will fail if the table is not created yet. But for STable, TDengine can create the table automatically if the application provides the STable name, table name and tags' value when inserting data points. The syntax is:
```mysql
INSERT INTO <tb_name> USING <stb_name> TAGS (<tag1_value>, ...) VALUES (field_value, ...) (field_value, ...) ... <tb_name2> USING <stb_name2> TAGS(<tag1_value2>, ...) VALUES (<field1_value1>, ...) ...;
```
When inserting data points into table tb_name, the system will check if table tb_name is created or not. If it is already created, the data points will be inserted as usual. But if the table is not created yet, the system will create the table tb_bame using STable stb_name as the template with the tags. Multiple tables can be specified in the SQL statement.
## Management of STables
After you can create a STable, you can describe, delete, change STables. This section lists all the supported operations.
### Show STables in current DB
```mysql
show stables;
```
It lists all STables in current DB, including the name, created time, number of fileds, number of tags, and number of tables which are created via this STable.
### Describe a STable
```mysql
DESCRIBE <stable_name>
```
It lists the STable's schema and tags
### Drop a STable
```mysql
DROP TABLE <stable_name>
```
To delete a STable, all the tables created via this STable shall be deleted first, otherwise, it will fail.
### List the Associated Tables of a STable
```mysql
SELECT TBNAME,[TAG_NAME,…] FROM <stable_name> WHERE <tag_name> <[=|=<|>=|<>] values..> ([AND|OR] …)
```
It will list all the tables which satisfy the tag filter conditions. The tables are all created from this specific STable. TBNAME is a new keyword introduced, it is the table name associated with the STable.
```mysql
SELECT COUNT(TBNAME) FROM <stable_name> WHERE <tag_name> <[=|=<|>=|<>] values..> ([AND|OR] …)
```
The above SQL statement will list the number of tables in a STable, which satisfy the filter condition.
## Management of Tags
You can add, delete and change the tags for a STable, and you can change the tag value of a table. The SQL commands are listed below.
###Add a Tag
```mysql
ALTER TABLE <stable_name> ADD TAG <new_tag_name> <TYPE>
```
It adds a new tag to the STable with a data type. The maximum number of tags is 6.
###Drop a Tag
```mysql
ALTER TABLE <stable_name> DROP TAG <tag_name>
```
It drops a tag from a STable. The first tag could not be deleted, and there must be at least one tag.
###Change a Tag's Name
```mysql
ALTER TABLE <stable_name> CHANGE TAG <old_tag_name> <new_tag_name>
```
It changes the name of a tag from old to new.
###Change the Tag's Value
```mysql
ALTER TABLE <table_name> SET TAG <tag_name>=<new_tag_value>
```
It changes a table's tag value to a new one.
此差异已折叠。
此差异已折叠。
#系统管理
## 文件目录结构
安装TDengine后,默认会在操作系统中生成下列目录或文件:
| 目录/文件 | 说明 |
| ---------------------- | :------------------------------------------------|
| /etc/taos/taos.cfg | TDengine默认[配置文件] |
| /usr/local/taos/driver | TDengine动态链接库目录 |
| /var/lib/taos | TDengine默认数据文件目录,可通过[配置文件]修改位置. |
| /var/log/taos | TDengine默认日志文件目录,可通过[配置文件]修改位置 |
| /usr/local/taos/bin | TDengine可执行文件目录 |
### 可执行文件
TDengine的所有可执行文件默认存放在 _/usr/local/taos/bin_ 目录下。其中包括:
- _taosd_:TDengine服务端可执行文件
- _taos_: TDengine Shell可执行文件
- _taosdump_:数据导出工具
- *rmtaos*: 一个卸载TDengine的脚本, 请谨慎执行
您可以通过修改系统配置文件taos.cfg来配置不同的数据目录和日志目录
## 服务端配置
TDengine系统后台服务由taosd提供,可以在配置文件taos.cfg里修改配置参数,以满足不同场景的需求。配置文件的缺省位置在/etc/taos目录,可以通过taosd命令行执行参数-c指定配置文件目录。比如taosd -c /home/user来指定配置文件位于/home/user这个目录。
下面仅仅列出一些重要的配置参数,更多的参数请看配置文件里的说明。各个参数的详细介绍及作用请看前述章节。**注意:配置修改后,需要重启*taosd*服务才能生效。**
- internalIp: 对外提供服务的IP地址,默认取第一个IP地址
- mgmtShellPort:管理节点与客户端通信使用的TCP/UDP端口号(默认值是6030)。此端口号在内向后连续的5个端口都会被UDP通信占用,即UDP占用[6030-6034],同时TCP通信也会使用端口[6030]。
- vnodeShellPort:数据节点与客户端通信使用的TCP/UDP端口号(默认值是6035)。此端口号在内向后连续的5个端口都会被UDP通信占用,即UDP占用[6035-6039],同时TCP通信也会使用端口[6035]
- httpPort:数据节点对外提供RESTful服务使用TCP,端口号[6020]
- dataDir: 数据文件目录,缺省是/var/lib/taos
- maxUsers:用户的最大数量
- maxDbs:数据库的最大数量
- maxTables:数据表的最大数量
- enableMonitor: 系统监测标志位,0:关闭,1:打开
- logDir: 日志文件目录,缺省是/var/log/taos
- numOfLogLines:日志文件的最大行数
- debugFlag: 系统debug日志开关,131:仅错误和报警信息,135:所有
不同应用场景的数据往往具有不同的数据特征,比如保留天数、副本数、采集频次、记录大小、采集点的数量、压缩等都可完全不同。为获得在存储上的最高效率,TDengine提供如下存储相关的系统配置参数:
- days:一个数据文件覆盖的时间长度,单位为天
- keep:数据库中数据保留的天数
- rows: 文件块中记录条数
- comp: 文件压缩标志位,0:关闭,1:一阶段压缩,2:两阶段压缩
- ctime:数据从写入内存到写入硬盘的最长时间间隔,单位为秒
- clog:数据提交日志(WAL)的标志位,0为关闭,1为打开
- tables:每个vnode允许创建表的最大数目
- cache: 内存块的大小(字节数)
- tblocks: 每张表最大的内存块数
- ablocks: 每张表平均的内存块数
- precision:时间戳为微秒的标志位,ms表示毫秒,us表示微秒
对于一个应用场景,可能有多种数据特征的数据并存,最佳的设计是将具有相同数据特征的表放在一个库里,这样一个应用有多个库,而每个库可以配置不同的存储参数,从而保证系统有最优的性能。TDengine容许应用在创建库时指定上述存储参数,如果指定,该参数就将覆盖对应的系统配置参数。举例,有下述SQL:
```
create database demo days 10 cache 16000 ablocks 4
```
该SQL创建了一个库demo, 每个数据文件保留10天数据,内存块为16000字节,每个表平均占用4个内存块,而其他参数与系统配置完全一致。
## 客户端配置
TDengine系统的前台交互客户端应用程序为taos,它与taosd共享同一个配置文件taos.cfg。运行taos时,使用参数-c指定配置文件目录,如taos -c /home/cfg,表示使用/home/cfg/目录下的taos.cfg配置文件中的参数,缺省目录是/etc/taos。更多taos的使用方法请见[Shell命令行程序](#_TDengine_Shell命令行程序)。本节主要讲解taos客户端应用在配置文件taos.cfg文件中使用到的参数。
客户端配置参数列表及解释
- masterIP:客户端默认发起请求的服务器的IP地址
- charset:指明客户端所使用的字符集,默认值为UTF-8。TDengine存储nchar类型数据时使用的是unicode存储,因此客户端需要告知服务自己所使用的字符集,也即客户端所在系统的字符集。
- locale:设置系统语言环境。Linux上客户端与服务端共享
- defaultUser:默认登录用户,默认值root
- defaultPass:默认登录密码,默认值taosdata
TCP/UDP端口,以及日志的配置参数,与server的配置参数完全一样。
启动taos时,你也可以从命令行指定IP地址、端口号,用户名和密码,否则就从taos.cfg读取。
## 用户管理
系统管理员可以在CLI界面里添加、删除用户,也可以修改密码。CLI里SQL语法如下:
```
CREATE USER user_name PASS ‘password’
```
创建用户,并制定用户名和密码,密码需要用单引号引起来
```
DROP USER user_name
```
删除用户,限root用户使用
```
ALTER USER user_name PASS ‘password’
```
修改用户密码, 为避免被转换为小写,密码需要用单引号引用
```
SHOW USERS
```
显示所有用户
## 数据导入
TDengine提供两种方便的数据导入功能,一种按脚本文件导入,一种按数据文件导入。
**按脚本文件导入**
TDengine的shell支持source filename命令,用于批量运行文件中的SQL语句。用户可将建库、建表、写数据等SQL命令写在同一个文件中,每条命令单独一行,在shell中运行source命令,即可按顺序批量运行文件中的SQL语句。以‘#’开头的SQL语句被认为是注释,shell将自动忽略。
**按数据文件导入**
TDengine也支持在shell对已存在的表从CSV文件中进行数据导入。每个CSV文件只属于一张表且CSV文件中的数据格式需与要导入表的结构相同。其语法如下
```mysql
insert into tb1 file a.csv b.csv tb2 c.csv …
import into tb1 file a.csv b.csv tb2 c.csv …
```
## 数据导出
为方便数据导出,TDengine提供了两种导出方式,分别是按表导出和用taosdump导出。
**按表导出CSV文件**
如果用户需要导出一个表或一个STable中的数据,可在shell中运行
```
select * from <tb_name> >> a.csv
```
这样,表tb中的数据就会按照CSV格式导出到文件a.csv中。
**用taosdump导出数据**
TDengine提供了方便的数据库导出工具taosdump。用户可以根据需要选择导出所有数据库、一个数据库或者数据库中的一张表,所有数据或一时间段的数据,甚至仅仅表的定义。其用法如下:
- 导出数据库中的一张或多张表:taosdump [OPTION…] dbname tbname …
- 导出一个或多个数据库: taosdump [OPTION…] --databases dbname…
- 导出所有数据库(不含监控数据库):taosdump [OPTION…] --all-databases
用户可通过运行taosdump --help获得更详细的用法说明
## 系统连接、任务查询管理
系统管理员可以从CLI查询系统的连接、正在进行的查询、流式计算,并且可以关闭连接、停止正在进行的查询和流式计算。CLI里SQL语法如下:
```
SHOW CONNECTIONS
```
显示数据库的连接,其中一列显示ip:port, 为连接的IP地址和端口号。
```
KILL CONNECTION <connection-id>
```
强制关闭数据库连接,其中的connection-id是SHOW CONNECTIONS中显示的 ip:port字串,如“192.168.0.1:42198”,拷贝粘贴即可。
```
SHOW QUERIES
```
显示数据查询,其中一列显示ip:port:id, 为发起该query应用的IP地址,端口号,以及系统分配的ID。
```
KILL QUERY <query-id>
```
强制关闭数据查询,其中query-id是SHOW QUERIES中显示的ip:port:id字串,如“192.168.0.1:42198:11”,拷贝粘贴即可。
```
SHOW STREAMS
```
显示流式计算,其中一列显示ip:port:id, 为启动该stream的IP地址、端口和系统分配的ID。
```
KILL STREAM <stream-id>
```
强制关闭流式计算,其中的中stream-id是SHOW STREAMS中显示的ip:port:id字串,如“192.168.0.1:42198:18”,拷贝粘贴即可。
## 系统监控
TDengine启动后,会自动创建一个监测数据库SYS,并自动将服务器的CPU、内存、硬盘空间、带宽、请求数、磁盘读写速度、慢查询等信息定时写入该数据库。TDengine还将重要的系统操作(比如登录、创建、删除数据库等)日志以及各种错误报警信息记录下来存放在SYS库里。系统管理员可以从CLI直接查看这个数据库,也可以在WEB通过图形化界面查看这些监测信息。
这些监测信息的采集缺省是打开的,但可以修改配置文件里的选项enableMonitor将其关闭或打开。
\ No newline at end of file
#Administrator
## Directory and Files
After TDengine is installed, by default, the following directories will be created:
| Directory/File | Description |
| ---------------------- | :------------------------------ |
| /etc/taos/taos.cfg | TDengine configuration file |
| /usr/local/taos/driver | TDengine dynamic link library |
| /var/lib/taos | TDengine default data directory |
| /var/log/taos | TDengine default log directory |
| /usr/local/taos/bin. | TDengine executables |
### Executables
All TDengine executables are located at _/usr/local/taos/bin_ , including:
- `taosd`:TDengine server
- `taos`: TDengine Shell, the command line interface.
- `taosdump`:TDengine data export tool
- `rmtaos`: a script to uninstall TDengine
You can change the data directory and log directory setting through the system configuration file
## Configuration on Server
`taosd` is running on the server side, you can change the system configuration file taos.cfg to customize its behavior. By default, taos.cfg is located at /etc/taos, but you can specify the path to configuration file via the command line parameter -c. For example: `taosd -c /home/user` means the configuration file will be read from directory /home/user.
This section lists only the most important configuration parameters. Please check taos.cfg to find all the configurable parameters. **Note: to make your new configurations work, you have to restart taosd after you change taos.cfg**.
- mgmtShellPort: TCP and UDP port between client and TDengine mgmt (default: 6030). Note: 5 successive UDP ports (6030-6034) starting from this number will be used.
- vnodeShellPort: TCP and UDP port between client and TDengine vnode (default: 6035). Note: 5 successive UDP ports (6035-6039) starting from this number will be used.
- httpPort: TCP port for RESTful service (default: 6020)
- dataDir: data directory, default is /var/lib/taos
- maxUsers: maximum number of users allowed
- maxDbs: maximum number of databases allowed
- maxTables: maximum number of tables allowed
- enableMonitor: turn on/off system monitoring, 0: off, 1: on
- logDir: log directory, default is /var/log/taos
- numOfLogLines: maximum number of lines in the log file
- debugFlag: log level, 131: only error and warnings, 135: all
In different scenarios, data characteristics are different. For example, the retention policy, data sampling period, record size, the number of devices, and data compression may be different. To gain the best performance, you can change the following configurations related to storage:
- days: number of days to cover for a data file
- keep: number of days to keep the data
- rows: number of rows of records in a block in data file.
- comp: compression algorithm, 0: off, 1: standard; 2: maximum compression
- ctime: period (seconds) to flush data to disk
- clog: flag to turn on/off Write Ahead Log, 0: off, 1: on
- tables: maximum number of tables allowed in a vnode
- cache: cache block size (bytes)
- tblocks: maximum number of cache blocks for a table
- abloks: average number of cache blocks for a table
- precision: timestamp precision, us: microsecond ms: millisecond, default is ms
For an application, there may be multiple data scenarios. The best design is to put all data with the same characteristics into one database. One application may have multiple databases, and every database has its own configuration to maximize the system performance. You can specify the above configurations related to storage when you create a database. For example:
```mysql
CREATE DATABASE demo DAYS 10 CACHE 16000 ROWS 2000
```
The above SQL statement will create a database demo, with 10 days for each data file, 16000 bytes for a cache block, and 2000 rows in a file block.
The configuration provided when creating a database will overwrite the configuration in taos.cfg.
## Configuration on Client
*taos* is the TDengine shell and is a client that connects to taosd. TDengine uses the same configuration file taos.cfg for the client, with default location at /etc/taos. You can change it by specifying command line parameter -c when you run taos. For example, *taos -c /home/user*, it will read the configuration file taos.cfg from directory /home/user.
The parameters related to client configuration are listed below:
- masterIP: IP address of TDengine server
- charset: character set, default is the system . For data type nchar, TDengine uses unicode to store the data. Thus, the client needs to tell its character set.
- locale: system language setting
- defaultUser: default login user, default is root
- defaultPass: default password, default is taosdata
For TCP/UDP port, and system debug/log configuration, it is the same as the server side.
For server IP, user name, password, you can always specify them in the command line when you run taos. If they are not specified, they will be read from the taos.cfg
## User Management
System administrator (user root) can add, remove a user, or change the password from the TDengine shell. Commands are listed below:
Create a user, password shall be quoted with the single quote.
```mysql
CREATE USER user_name PASS ‘password’
```
Remove a user
```mysql
DROP USER user_name
```
Change the password for a user
```mysql
ALTER USER user_name PASS ‘password’
```
List all users
```mysql
SHOW USERS
```
## Import Data
Inside the TDengine shell, you can import data into TDengine from either a script or CSV file
**Import from Script**
```
source <filename>
```
Inside the file, you can put all SQL statements there. Each SQL statement has a line. If a line starts with "#", it means comments, it will be skipped. The system will execute the SQL statements line by line automatically until the ends
**Import from CVS**
```mysql
insert into tb1 file a.csv b.csv tb2 c.csv …
import into tb1 file a.csv b.csv tb2 c.csv …
```
Each csv file contains records for only one table, and the data structure shall be the same as the defined schema for the table.
## Export Data
You can export data either from TDengine shell or from tool taosdump.
**Export from TDengine Shell**
```mysql
select * from <tb_name> >> a.csv
```
The above SQL statement will dump the query result set into a csv file.
**Export Using taosdump**
TDengine provides a data dumping tool taosdump. You can choose to dump a database, a table, all data or data only a time range, even only the metadata. For example:
- Export one or more tables in a DB: taosdump [OPTION…] dbname tbname …
- Export one or more DBs: taosdump [OPTION…] --databases dbname…
- Export all DBs (excluding system DB): taosdump [OPTION…] --all-databases
run *taosdump —help* to get a full list of the options
## Management of Connections, Streams, Queries
The system administrator can check, kill the ongoing connections, streams, or queries.
```
SHOW CONNECTIONS
```
It lists all connections, one column shows ip:port from the client.
```
KILL CONNECTION <connection-id>
```
It kills the connection, where connection-id is the ip:port showed by "SHOW CONNECTIONS". You can copy and paste it.
```
SHOW QUERIES
```
It shows the ongoing queries, one column ip:port:id shows the ip:port from the client, and id assigned by the system
```
KILL QUERY <query-id>
```
It kills the query, where query-id is the ip:port:id showed by "SHOW QUERIES". You can copy and paste it.
```
SHOW STREAMS
```
It shows the continuous queries, one column shows the ip:port:id, where ip:port is the connection from the client, and id assigned by the system.
```
KILL STREAM <stream-id>
```
It kills the continuous query, where stream-id is the ip:port:id showed by "SHOW STREAMS". You can copy and paste it.
## System Monitor
TDengine runs a system monitor in the background. Once it is started, it will create a database sys automatically. System monitor collects the metric like CPU, memory, network, disk, number of requests periodically, and writes them into database sys. Also, TDengine will log all important actions, like login, logout, create database, drop database and so on, and write them into database sys.
You can check all the saved monitor information from database sys. By default, system monitor is turned on. But you can turn it off by changing the parameter in the configuration file.
#高级功能
##连续查询(Continuous Query)
连续查询是TDengine定期自动执行的查询,采用滑动窗口的方式进行计算,是一种简化的时间驱动的流式计算。针对库中的表或超级表,TDengine可提供定期自动执行的连续查询,用户可让TDengine推送查询的结果,也可以将结果再写回到TDengine中。每次执行的查询是一个时间窗口,时间窗口随着时间流动向前滑动。在定义连续查询的时候需要指定时间窗口(time window, 参数interval )大小和每次前向增量时间(forward sliding times, 参数sliding)。
TDengine的连续查询采用时间驱动模式,可以直接使用TAOS SQL进行定义,不需要额外的操作。使用连续查询,可以方便快捷地按照时间窗口生成结果,从而对原始采集数据进行降采样(down sampling)。用户通过TAOS SQL定义连续查询以后,TDengine自动在最后的一个完整的时间周期末端拉起查询,并将计算获得的结果推送给用户或者写回TDengine。
TDengine提供的连续查询与普通流计算中的时间窗口计算具有以下区别:
- 不同于流计算的实时反馈计算结果,连续查询只在时间窗口关闭以后才开始计算。例如时间周期是1天,那么当天的结果只会在23:59:59以后才会生成。
- 如果有历史记录写入到已经计算完成的时间区间,连续查询并不会重新进行计算,也不会重新将结果推送给用户。对于写回TDengine的模式,也不会更新已经存在的计算结果。
- 使用连续查询推送结果的模式,服务端并不缓存客户端计算状态,也不提供Exactly-Once的语意保证。如果用户的应用端崩溃,再次拉起的连续查询将只会从再次拉起的时间开始重新计算最近的一个完整的时间窗口。如果使用写回模式,TDengine可确保数据写回的有效性和连续性。
####使用连续查询
使用TAOS SQL定义连续查询的过程,需要调用API taos_stream在应用端启动连续查询。例如要对统计表FOO_TABLE 每1分钟统计一次记录数量,前向滑动的时间是30秒,SQL语句如下:
```sql
SELECT COUNT(*)
FROM FOO_TABLE
INTERVAL(1M) SLIDING(30S)
```
其中查询的时间窗口(time window)是1分钟,前向增量(forward sliding time)时间是30秒。也可以不使用sliding来指定前向滑动时间,此时系统将自动向前滑动一个查询时间窗口再开始下一次计算,即时间窗口长度等于前向滑动时间。
```sql
SELECT COUNT(*)
FROM FOO_TABLE
INTERVAL(1M)
```
如果需要将连续查询的计算结果写回到数据库中,可以使用如下的SQL语句
```sql
CREATE TABLE QUERY_RES
AS
SELECT COUNT(*)
FROM FOO_TABLE
INTERVAL(1M) SLIDING(30S)
```
此时系统将自动创建表QUERY_RES,然后将连续查询的结果写入到该表。需要注意的是,前向滑动时间不能大于时间窗口的范围。如果用户指定的前向滑动时间超过时间窗口范围,系统将自动将其设置为时间窗口的范围值。如上所示SQL语句,如果用户设置前向滑动时间超过1分钟,系统将强制将其设置为1分钟。
此外,TDengine还支持用户指定连续查询的结束时间,默认如果不输入结束时间,连续查询将永久运行,如果用户指定了结束时间,连续查询在系统时间达到指定的时间以后停止运行。如SQL所示,连续查询将运行1个小时,1小时之后连续查询自动停止。
```sql
CREATE TABLE QUERY_RES
AS
SELECT COUNT(*)
FROM FOO_TABLE
WHERE TS > NOW AND TS <= NOW + 1H
INTERVAL(1M) SLIDING(30S)
```
此外,还需要注意的是查询时间窗口的最小值是10毫秒,没有时间窗口范围的上限。
####管理连续查询
用户可在控制台中通过 *show streams* 命令来查看系统中全部运行的连续查询,并可以通过 *kill stream* 命令杀掉对应的连续查询。在写回模式中,如果用户可以直接将写回的表删除,此时连续查询也会自动停止并关闭。后续版本会提供更细粒度和便捷的连续查询管理命令。
##数据订阅(Publisher/Subscriber)
基于数据天然的时间序列特性,TDengine的数据写入(insert)与消息系统的数据发布(pub)逻辑上一致,均可视为系统中插入一条带时间戳的新记录。同时,TDengine在内部严格按照数据时间序列单调递增的方式保存数据。本质上来说,TDengine中里每一张表均可视为一个标准的消息队列。
TDengine内嵌支持轻量级的消息订阅与推送服务。使用系统提供的API,用户可订阅数据库中的某一张表(或超级表)。订阅的逻辑和操作状态的维护均是由客户端完成,客户端定时轮询服务器是否有新的记录到达,有新的记录到达就会将结果反馈到客户。
TDengine的订阅与推送服务的状态是客户端维持,TDengine服务器并不维持。因此如果应用重启,从哪个时间点开始获取最新数据,由应用决定。
####API说明
使用订阅的功能,主要API如下:
<ul>
<li><p><code>TAOS_SUB *taos_subscribe(char *host, char *user, char *pass, char *db, char *table, int64_t time, int mseconds)</code></p><p>该函数负责启动订阅服务。其中参数说明:</p></li><ul>
<li><p>host:主机IP地址</p></li>
<li><p>user:数据库登录用户名</p></li>
<li><p>pass:密码</p></li>
<li><p>db:数据库名称</p></li>
<li><p>table:(超级) 表的名称</p></li>
<li><p>time:启动时间,Unix Epoch时间,单位为毫秒。从1970年1月1日起计算的毫秒数。如果设为0,表示从当前时间开始订阅</p></li>
<li><p>mseconds:查询数据库更新的时间间隔,单位为毫秒。一般设置为1000毫秒。返回值为指向TDengine_SUB 结构的指针,如果返回为空,表示失败。</p></li>
</ul><li><p><code>TAOS_ROW taos_consume(TAOS_SUB *tsub)</code>
</p><p>该函数用来获取订阅的结果,用户应用程序将其置于一个无限循环语句。如果数据库有新记录到达,该API将返回该最新的记录。如果没有新的记录,该API将阻塞。如果返回值为空,说明系统出错。参数说明:</p></li><ul><li><p>tsub:taos_subscribe的结构体指针。</p></li></ul><li><p><code>void taos_unsubscribe(TAOS_SUB *tsub)</code></p><p>取消订阅。应用程序退出时,务必调用该函数以避免资源泄露。</p></li>
<li><p><code>int taos_num_subfields(TAOS_SUB *tsub)</code></p><p>获取返回的一行记录中数据包含多少列。</p></li>
<li><p><code>TAOS_FIELD *taos_fetch_subfields(TAOS_SUB *tsub)</code></p><p>获取每列数据的属性(数据类型、名字、长度),与taos_num_subfileds配合使用,可解析返回的每行数据。</p></li></ul>
示例代码:请看安装包中的的示范程序
##缓存 (Cache)
TDengine采用时间驱动缓存管理策略(First-In-First-Out,FIFO),又称为写驱动的缓存管理机制。这种策略有别于读驱动的数据缓存模式(Least-Recent-Use,LRU),直接将最近写入的数据保存在系统的缓存中。当缓存达到临界值的时候,将最早的数据批量写入磁盘。一般意义上来说,对于物联网数据的使用,用户最为关心最近产生的数据,即当前状态。TDengine充分利用了这一特性,将最近到达的(当前状态)数据保存在缓存中。
TDengine通过查询函数向用户提供毫秒级的数据获取能力。直接将最近到达的数据保存在缓存中,可以更加快速地响应用户针对最近一条或一批数据的查询分析,整体上提供更快的数据库查询响应能力。从这个意义上来说,可通过设置合适的配置参数将TDengine作为数据缓存来使用,而不需要再部署额外的缓存系统,可有效地简化系统架构,降低运维的成本。需要注意的是,TDengine重启以后系统的缓存将被清空,之前缓存的数据均会被批量写入磁盘,缓存的数据将不会像专门的Key-value缓存系统再将之前缓存的数据重新加载到缓存中。
TDengine分配固定大小的内存空间作为缓存空间,缓存空间可根据应用的需求和硬件资源配置。通过适当的设置缓存空间,TDengine可以提供极高性能的写入和查询的支持。TDengine中每个虚拟节点(virtual node)创建时分配独立的缓存池。每个虚拟节点管理自己的缓存池,不同虚拟节点间不共享缓存池。每个虚拟节点内部所属的全部表共享该虚拟节点的缓存池。
一个缓存池了有很多个缓存块,缓存的大小由缓存块的个数以及缓存块的大小决定。参数cacheBlockSize决定每个缓存块的大小,参数cacheNumOfBlocks决定每个虚拟节点可用缓存块数量。因此单个虚拟节点总缓存开销为cacheBlockSize x cacheNumOfBlocks。参数numOfBlocksPerMeter决定每张表可用缓存块的数量,TDengine要求每张表至少有2个缓存块可供使用,因此cacheNumOfBlocks的数值不应该小于虚拟节点中所包含的表数量的两倍,即cacheNumOfBlocks ≤ sessionsPerVnode x 2。一般情况下cacheBlockSize可以不用调整,使用系统默认值即可,缓存块需要存储至少几十条记录才能确保TDengine更有效率地进行数据写入。
你可以通过函数last快速获取一张表或一张超级表的最后一条记录,这样很便于在大屏显示各设备的实时状态或采集值。例如:
```mysql
select degree from thermometer where location='beijing';
```
该SQL语句将获取所有位于北京的传感器最后记录的温度值。
\ No newline at end of file
#Advanced Features
##Continuous Query
Continuous Query is a query executed by TDengine periodically with a sliding window, it is a simplified stream computing driven by timers, not by events. Continuous query can be applied to a table or a STable, and the result set can be passed to the application directly via call back function, or written into a new table in TDengine. The query is always executed on a specified time window (window size is specified by parameter interval), and this window slides forward while time flows (the sliding period is specified by parameter sliding).
Continuous query is defined by TAOS SQL, there is nothing special. One of the best applications is downsampling. Once it is defined, at the end of each cycle, the system will execute the query, pass the result to the application or write it to a database.
If historical data pints are inserted into the stream, the query won't be re-executed, and the result set won't be updated. If the result set is passed to the application, the application needs to keep the status of continuous query, the server won't maintain it. If application re-starts, it needs to decide the time where the stream computing shall be started.
####How to use continuous query
- Pass result set to application
Application shall use API taos_stream (details in connector section) to start the stream computing. Inside the API, the SQL syntax is:
```sql
SELECT aggregation FROM [table_name | stable_name]
INTERVAL(window_size) SLIDING(period)
```
where the new keyword INTERVAL specifies the window size, and SLIDING specifies the sliding period. If parameter sliding is not specified, the sliding period will be the same as window size. The minimum window size is 10ms. The sliding period shall not be larger than the window size. If you set a value larger than the window size, the system will adjust it to window size automatically.
For example:
```sql
SELECT COUNT(*) FROM FOO_TABLE
INTERVAL(1M) SLIDING(30S)
```
The above SQL statement will count the number of records for the past 1-minute window every 30 seconds.
- Save the result into a database
If you want to save the result set of stream computing into a new table, the SQL shall be:
```sql
CREATE TABLE table_name AS
SELECT aggregation from [table_name | stable_name]
INTERVAL(window_size) SLIDING(period)
```
Also, you can set the time range to execute the continuous query. If no range is specified, the continuous query will be executed forever. For example, the following continuous query will be executed from now and will stop in one hour.
```sql
CREATE TABLE QUERY_RES AS
SELECT COUNT(*) FROM FOO_TABLE
WHERE TS > NOW AND TS <= NOW + 1H
INTERVAL(1M) SLIDING(30S)
```
###Manage the Continuous Query
Inside TDengine shell, you can use the command "show streams" to list the ongoing continuous queries, the command "kill stream" to kill a specific continuous query.
If you drop a table generated by the continuous query, the query will be removed too.
##Publisher/Subscriber
Time series data is a sequence of data points over time. Inside a table, the data points are stored in order of timestamp. Also, there is a data retention policy, the data points will be removed once their lifetime is passed. From another view, a table in DTengine is just a standard message queue.
To reduce the development complexity and improve data consistency, TDengine provides the pub/sub functionality. To publish a message, you simply insert a record into a table. Compared with popular messaging tool Kafka, you subscribe to a table or a SQL query statement, instead of a topic. Once new data points arrive, TDengine will notify the application. The process is just like Kafka.
The detailed API will be introduced in the [connectors](https://www.taosdata.com/en/documentation/advanced-features/) section.
##Caching
TDengine allocates a fixed-size buffer in memory, the newly arrived data will be written into the buffer first. Every device or table gets one or more memory blocks. For typical IoT scenarios, the hot data shall always be newly arrived data, they are more important for timely analysis. Based on this observation, TDengine manages the cache blocks in First-In-First-Out strategy. If no enough space in the buffer, the oldest data will be saved into hard disk first, then be overwritten by newly arrived data. TDengine also guarantees every device can keep at least one block of data in the buffer.
By this design, the application can retrieve the latest data from each device super-fast, since they are all available in memory. You can use last or last_row function to return the last data record. If the super table is used, it can be used to return the last data records of all or a subset of devices. For example, to retrieve the latest temperature from thermometers in located Beijing, execute the following SQL
```mysql
select last(*) from thermometers where location=’beijing’
```
By this design, caching tool, like Redis, is not needed in the system. It will reduce the complexity of the system.
TDengine creates one or more virtual nodes(vnode) in each data node. Each vnode contains data for multiple tables and has its own buffer. The buffer of a vnode is fully separated from the buffer of another vnode, not shared. But the tables in a vnode share the same buffer.
System configuration parameter cacheBlockSize configures the cache block size in bytes, and another parameter cacheNumOfBlocks configures the number of cache blocks. The total memory for the buffer of a vnode is $cacheBlockSize \times cacheNumOfBlocks$. Another system parameter numOfBlocksPerMeter configures the maximum number of cache blocks a table can use. When you create a database, you can specify these parameters.
\ No newline at end of file
# 连接器
TDengine提供了丰富的应用程序开发接口,其中包括C/C++、JAVA、Python、RESTful、Go等,便于用户快速开发应用。
## C/C++ Connector
C/C++的API类似于MySQL的C API。应用程序使用时,需要包含TDengine头文件 _taos.h_(安装后,位于_/usr/local/taos/include_):
```C
#include <taos.h>
```
在编译时需要链接TDengine动态库_libtaos.so_(安装后,位于/usr/local/taos/driver,gcc编译时,请加上 -ltaos)。 所有API都以返回_-1_或_NULL_均表示失败。
### C/C++同步API
传统的数据库操作API,都属于同步操作。应用调用API后,一直处于阻塞状态,直到服务器返回结果。TDengine支持如下API:
- `TAOS *taos_connect(char *ip, char *user, char *pass, char *db, int port)`
创建数据库连接,初始化连接上下文。其中需要用户提供的参数包含:TDengine管理主节点的IP地址、用户名、密码、数据库名字和端口号。如果用户没有提供数据库名字,也可以正常连接,用户可以通过该连接创建新的数据库,如果用户提供了数据库名字,则说明该数据库用户已经创建好,缺省使用该数据库。返回值为空表示失败。应用程序需要保存返回的参数,以便后续API调用。
- `void taos_close(TAOS *taos)`
关闭连接, 其中taos是taos_connect函数返回的指针。
- `int taos_query(TAOS *taos, char *sqlstr)`
该API用来执行SQL语句,可以是DQL语句也可以是DML语句,或者DDL语句。其中的taos参数是通过taos_connect()获得的指针。返回值-1表示失败。
- `TAOS_RES *taos_use_result(TAOS *taos)`
选择相应的查询结果集。
- `TAOS_ROW taos_fetch_row(TAOS_RES *res)`
按行获取查询结果集中的数据。
- `int taos_num_fields(TAOS_RES *res)`
获取查询结果集中的列数。
- `TAOS_FIELD *taos_fetch_fields(TAOS_RES *res)`
获取查询结果集每列数据的属性(数据类型、名字、字节数),与taos_num_fileds配合使用,可用来解析taos_fetch_row返回的一个元组(一行)的数据。
- `void taos_free_result(TAOS_RES *res)`
释放查询结果集以及相关的资源。查询完成后,务必调用该API释放资源,否则可能导致应用内存泄露。
- `void taos_init()`
初始化环境变量。如果应用没有主动调用该API,那么应用在调用taos_connect时将自动调用。因此一般情况下应用程序无需手动调用该API。
- `char *taos_errstr(TAOS *taos)`
获取最近一次API调用失败的原因,返回值为字符串。
- `char *taos_errno(TAOS *taos)`
获取最近一次API调用失败的原因,返回值为错误代码。
- `int taos_options(TSDB_OPTION option, const void * arg, ...)`
设置客户端选项,目前只支持时区设置(_TSDB_OPTION_TIMEZONE_)和编码设置(_TSDB_OPTION_LOCALE_)。时区和编码默认为操作系统当前设置。
上述12个API是C/C++接口中最重要的API,剩余的辅助API请参看_taos.h_文件。
**注意**:对于单个数据库连接,在同一时刻只能有一个线程使用该链接调用API,否则会有未定义的行为出现并可能导致客户端crash。客户端应用可以通过建立多个连接进行多线程的数据写入或查询处理。
### C/C++异步API
同步API之外,TDengine还提供性能更高的异步调用API处理数据插入、查询操作。在软硬件环境相同的情况下,异步API处理数据插入的速度比同步API快2~4倍。异步API采用非阻塞式的调用方式,在系统真正完成某个具体数据库操作前,立即返回。调用的线程可以去处理其他工作,从而可以提升整个应用的性能。异步API在网络延迟严重的情况下,优点尤为突出。
异步API都需要应用提供相应的回调函数,回调函数参数设置如下:前两个参数都是一致的,第三个参数依不同的API而定。第一个参数param是应用调用异步API时提供给系统的,用于回调时,应用能够找回具体操作的上下文,依具体实现而定。第二个参数是SQL操作的结果集,如果为空,比如insert操作,表示没有记录返回,如果不为空,比如select操作,表示有记录返回。
异步API对于使用者的要求相对较高,用户可根据具体应用场景选择性使用。下面是三个重要的异步API:
- `void taos_query_a(TAOS *taos, char *sqlstr, void (*fp)(void *param, TAOS_RES *, int code), void *param);`
异步执行SQL语句。taos是调用taos_connect返回的数据库连接结构体。sqlstr是需要执行的SQL语句。fp是用户定义的回调函数。param是应用提供一个用于回调的参数。回调函数fp的第三个参数code用于指示操作是否成功,0表示成功,负数表示失败(调用taos_errstr获取失败原因)。应用在定义回调函数的时候,主要处理第二个参数TAOS_RES *,该参数是查询返回的结果集。
- `void taos_fetch_rows_a(TAOS_RES *res, void (*fp)(void *param, TAOS_RES *, int numOfRows), void *param);`
批量获取异步查询的结果集,只能与taos_query_a配合使用。其中_res_是_taos_query_a回调时返回的结果集结构体指针,fp为回调函数。回调函数中的param是用户可定义的传递给回调函数的参数结构体。numOfRows表明有fetch数据返回的行数(numOfRows并不是本次查询满足查询条件的全部元组数量)。在回调函数中,应用可以通过调用taos_fetch_row前向迭代获取批量记录中每一行记录。读完一块内的所有记录后,应用需要在回调函数中继续调用taos_fetch_rows_a获取下一批记录进行处理,直到返回的记录数(numOfRows)为零(结果返回完成)或记录数为负值(查询出错)。
- `void taos_fetch_row_a(TAOS_RES *res, void (*fp)(void *param, TAOS_RES *, TAOS_ROW row), void *param);`
异步获取一条记录。其中res是taos_query_a回调时返回的结果集结构体指针。fp为回调函数。param是应用提供的一个用于回调的参数。回调时,第三个参数TAOS_ROW指向一行记录。不同于taos_fetch_rows_a,应用无需调用同步API taos_fetch_row来获取一个元组,更加简单。数据提取性能不及批量获取的API。
TDengine的异步API均采用非阻塞调用模式。应用程序可以用多线程同时打开多张表,并可以同时对每张打开的表进行查询或者插入操作。需要指出的是,**客户端应用必须确保对同一张表的操作完全串行化**,即对同一个表的插入或查询操作未完成时(未返回时),不能够执行第二个插入或查询操作。
### C/C++ 连续查询接口
TDengine提供时间驱动的实时流式计算API。可以每隔一指定的时间段,对一张或多张数据库的表(数据流)进行各种实时聚合计算操作。操作简单,仅有打开、关闭流的API。具体如下:
- `TAOS_STREAM *taos_open_stream(TAOS *taos, char *sqlstr, void (*fp)(void *param, TAOS_RES *, TAOS_ROW row), int64_t stime, void *param)`
该API用来创建数据流,其中taos是调用taos_connect返回的结构体指针;sqlstr是SQL查询语句(仅能使用查询语句);fp是用户定义的回调函数指针,每次流式计算完成后,均回调该函数,用户可在该函数内定义其内部业务逻辑;param是应用提供的用于回调的一个参数,回调时,提供给应用;stime是流式计算开始的时间,如果是0,表示从现在开始,如果不为零,表示从指定的时间开始计算(UTC时间从1970/1/1算起的毫秒数)。返回值为NULL,表示创建成功,返回值不为空,表示成功。TDengine将查询的结果(TAOS_ROW)、查询状态(TAOS_RES)、用户定义参数(PARAM)传递给回调函数,在回调函数内,用户可以使用taos_num_fields获取结果集列数,taos_fetch_fields获取结果集每列数据的类型。
- `void taos_close_stream (TAOS_STREAM *tstr)`
关闭数据流,其中提供的参数是taos_open_stream的返回值。用户停止流式计算的时候,务必关闭该数据流。
### C/C++ 数据订阅接口
订阅API目前支持订阅一张表,并通过定期轮询的方式不断获取写入表中的最新数据。
- `TAOS_SUB *taos_subscribe(char *host, char *user, char *pass, char *db, char *table, long time, int mseconds)`
该API用来启动订阅,需要提供的参数包含:TDengine管理主节点的IP地址、用户名、密码、数据库、数据库表的名字;time是开始订阅消息的时间,是从1970年1月1日起计算的毫秒数,为长整型, 如果设为0,表示从当前时间开始订阅;mseconds为查询数据库更新的时间间隔,单位为毫秒,建议设为1000毫秒。返回值为一指向TDengine_SUB结构的指针,如果返回为空,表示失败。
- `TAOS_ROW taos_consume(TAOS_SUB *tsub)`
该API用来获取最新消息,应用程序一般会将其置于一个无限循环语句中。其中参数tsub是taos_subscribe的返回值。如果数据库有新的记录,该API将返回,返回参数是一行记录。如果没有新的记录,该API将阻塞。如果返回值为空,说明系统出错,需要检查系统是否还在正常运行。
- `void taos_unsubscribe(TAOS_SUB *tsub)`
该API用于取消订阅,参数tsub是taos_subscribe的返回值。应用程序退出时,需要调用该API,否则有资源泄露。
- `int taos_num_subfields(TAOS_SUB *tsub)`
该API用来获取返回的一排数据中数据的列数
- `TAOS_FIELD *taos_fetch_subfields(TAOS_RES *res)`
该API用来获取每列数据的属性(数据类型、名字、字节数),与taos_num_subfileds配合使用,可用来解析返回的一排数据。
## Java Connector
### JDBC接口
如果用户使用Java开发企业级应用,可选用TDengine提供的JDBC Driver来调用服务。TDengine提供的JDBC Driver是标准JDBC规范的子集,遵循JDBC 标准(3.0)API规范,支持现有的各种Java开发框架。目前TDengine的JDBC driver并未发布到在线依赖仓库比如maven的中心仓库。因此用户开发时,需要手动把驱动包`taos-jdbcdriver-x.x.x-dist.jar`安装到开发环境的依赖仓库中。
TDengine 的驱动程序包的在不同操作系统上依赖不同的本地函数库(均由C语言编写)。Linux系统上,依赖一个名为`libtaos.so` 的本地库,.so即"Shared Object"缩写。成功安装TDengine后,`libtaos.so` 文件会被自动拷贝至`/usr/local/lib/taos`目录下,该目录也包含在Linux上自动扫描路径上。Windows系统上,JDBC驱动程序依赖于一个名为`taos.dll` 的本地库,.dll是动态链接库"Dynamic Link Library"的缩写。Windows上成功安装客户端后,JDBC驱动程序包默认位于`C:/TDengine/driver/JDBC/`目录下;其依赖的动态链接库`taos.dll`文件位于`C:/TDengine/driver/C`目录下,`taos.dll` 会被自动拷贝至系统默认搜索路径`C:/Windows/System32`下。
TDengine的JDBC Driver遵循标准JDBC规范,开发人员可以参考Oracle官方的JDBC相关文档来找到具体的接口和方法的定义与用法。TDengine的JDBC驱动在连接配置和支持的方法上与传统数据库驱动稍有不同。
TDengine的JDBC URL规范格式为:
`jdbc:TSDB://{host_ip}:{port}/{database_name}?[user={user}|&password={password}|&charset={charset}|&cfgdir={config_dir}|&locale={locale}|&timezone={timezone}]`
其中,`{}`中的内容必须,`[]`中为可选。配置参数说明如下:
- user:登陆TDengine所用用户名;默认值root
- password:用户登陆密码;默认值taosdata
- charset:客户端使用的字符集;默认值为系统字符集
- cfgdir:客户端配置文件目录路径;Linux OS上默认值`/etc/taos` ,Windows OS上默认值 `C:/TDengine/cfg`
- locale:客户端语言环境;默认值系统当前locale
- timezone:客户端使用的时区;默认值为系统当前时区
以上所有参数均可在调用java.sql.DriverManager类创建连接时指定,示例如下:
```java
import java.sql.Connection;
import java.sql.DriverManager;
import java.util.Properties;
import com.taosdata.jdbc.TSDBDriver;
public Connection getConn() throws Exception{
Class.forName("com.taosdata.jdbc.TSDBDriver");
String jdbcUrl = "jdbc:TAOS://127.0.0.1:0/db?user=root&password=taosdata";
Properties connProps = new Properties();
connProps.setProperty(TSDBDriver.PROPERTY_KEY_USER, "root");
connProps.setProperty(TSDBDriver.PROPERTY_KEY_PASSWORD, "taosdata");
connProps.setProperty(TSDBDriver.PROPERTY_KEY_CONFIG_DIR, "/etc/taos");
connProps.setProperty(TSDBDriver.PROPERTY_KEY_CHARSET, "UTF-8");
connProps.setProperty(TSDBDriver.PROPERTY_KEY_LOCALE, "en_US.UTF-8");
connProps.setProperty(TSDBDriver.PROPERTY_KEY_TIMEZONE, "UTC-8");
Connection conn = DriverManager.getConnection(jdbcUrl, connProps);
return conn;
}
```
这些配置参数中除了cfgdir外,均可在客户端配置文件taos.cfg中进行配置。调用java.sql.DriverManager时声明的配置参数优先级最高,JDBC URL的优先级次之,配置文件的优先级最低。例如charset同时在配置文件taos.cfg中配置,也在JDBC URL中配置,则使用JDBC URL中的配置值。
此外,尽管TDengine的JDBC驱动实现尽可能的与关系型数据库驱动保持一致,但时序空间数据库与关系对象型数据库服务的对象和技术特征的差异导致TDengine的Java API并不能与标准完全相同。对于有大量关系型数据库开发经验而初次接触TDengine的开发者来说,有以下一些值的注意的地方:
* TDengine不提供针对单条数据记录的删除和修改的操作,驱动中也没有支持相关方法
* 目前TDengine不支持表间的join或union操作,因此也缺乏对该部分API的支持
* TDengine支持批量写入,但是支持停留在SQL语句级别,而不是API级别,也就是说用户需要通过写特殊的SQL语句来实现批量
* 目前TDengine不支持嵌套查询(nested query),对每个Connection的实例,至多只能有一个打开的ResultSet实例;如果在ResultSet还没关闭的情况下执行了新的查询,TSDBJDBCDriver则会自动关闭上一个ResultSet
对于TDengine操作的报错信息,用户可使用JDBCDriver包里提供的枚举类TSDBError.java来获取error message和error code的列表。对于更多的具体操作的相关代码,请参考TDengine提供的使用示范项目`JDBCDemo`
## Python Connector
### Python客户端安装
用户可以在源代码的src/connector/python文件夹下找到python2和python3的安装包。用户可以通过pip命令安装:
`pip install src/connector/python/python2/`
`pip install src/connector/python/python3/`
如果机器上没有pip命令,用户可将src/connector/python/python3或src/connector/python/python2下的taos文件夹拷贝到应用程序的目录使用。
### Python客户端接口
在使用TDengine的python接口时,需导入TDengine客户端模块:
```
import taos
```
用户可通过python的帮助信息直接查看模块的使用信息,或者参考code/examples/python中的示例程序。以下为部分常用类和方法:
- _TDengineConnection_类
参考python中help(taos.TDengineConnection)。
- _TDengineCursor_类
参考python中help(taos.TDengineCursor)。
- _connect_方法
用于生成taos.TDengineConnection的实例。
## RESTful Connector
为支持各种不同类型平台的开发,TDengine提供符合REST设计标准的API,即RESTful API。为最大程度降低学习成本,不同于其他数据库RESTful API的设计方法,TDengine直接通过HTTP POST 请求BODY中包含的SQL语句来操作数据库,仅需要一个URL。
### HTTP请求格式
`http://<ip>:<PORT>/rest/sql`
​ 参数说明:
​ IP: 集群中的任一台主机
​ PORT: 配置文件中httpPort配置项,缺省为6020
如:http://192.168.0.1:6020/rest/sql 是指向IP地址为192.168.0.1的URL.
HTTP请求的Header里需带有身份认证信息,TDengine单机版仅支持Basic认证机制。
HTTP请求的BODY里就是一个完整的SQL语句,SQL语句中的数据表应提供数据库前缀,例如\<db-name>.\<tb-name>。如果表名不带数据库前缀,系统会返回错误。因为HTTP模块只是一个简单的转发,没有当前DB的概念。
使用curl来发起一个HTTP Request, 语法如下:
```
curl -H 'Authorization: Basic <TOKEN>' -d '<SQL>' <ip>:<PORT>/rest/sql
```
或者
```
curl -u username:password -d '<SQL>' <ip>:<PORT>/rest/sql
```
其中,`TOKEN``{username}:{password}`经过Base64编码之后的字符串,例如`root:taosdata`编码后为`cm9vdDp0YW9zZGF0YQ==`
### HTTP返回格式
返回值为JSON格式,如下:
```
{
"status": "succ",
"head": ["column1","column2", …],
"data": [
["2017-12-12 23:44:25.730", 1],
["2017-12-12 22:44:25.728", 4]
],
"rows": 2
}
```
说明:
- 第一行”status”告知操作结果是成功还是失败;
- 第二行”head”是表的定义,如果不返回结果集,仅有一列“affected_rows”;
- 第三行是具体返回的数据,一排一排的呈现。如果不返回结果集,仅[[affected_rows]]
- 第四行”rows”表明总共多少行数据
### 使用示例
- 在demo库里查询表t1的所有记录, curl如下:
`curl -H 'Authorization: Basic cm9vdDp0YW9zZGF0YQ==' -d 'select * from demo.t1' 192.168.0.1:6020/rest/sql`
返回值:
```
{
"status": "succ",
"head": ["column1","column2","column3"],
"data": [
["2017-12-12 23:44:25.730", 1, 2.3],
["2017-12-12 22:44:25.728", 4, 5.6]
],
"rows": 2
}
```
- 创建库demo:
`curl -H 'Authorization: Basic cm9vdDp0YW9zZGF0YQ==' -d 'create database demo' 192.168.0.1:6020/rest/sql`
返回值:
```
{
"status": "succ",
"head": ["affected_rows"],
"data": [[1]],
"rows": 1,
}
```
## Go Connector
TDengine提供了GO驱动程序“taosSql”包。taosSql驱动包是基于GO的“database/sql/driver”接口的实现。用户可在安装后的/usr/local/taos/connector/go目录获得GO的客户端驱动程序。用户需将驱动包/usr/local/taos/connector/go/src/taosSql目录拷贝到应用程序工程的src目录下。然后在应用程序中导入驱动包,就可以使用“database/sql”中定义的接口访问TDengine:
```Go
import (
"database/sql"
_ "taosSql"
)
```
taosSql驱动包内采用cgo模式,调用了TDengine的C/C++同步接口,与TDengine进行交互,因此,在数据库操作执行完成之前,客户端应用将处于阻塞状态。单个数据库连接,在同一时刻只能有一个线程调用API。客户应用可以建立多个连接,进行多线程的数据写入或查询处理。
更多使用的细节,请参考下载目录中的示例源码。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册