diff --git a/documentation20/Connections with other Tools-ch.md b/documentation20/Connections with other Tools-ch.md new file mode 100644 index 0000000000000000000000000000000000000000..34be7d1e76363f535af9b03b91aa6f618405c602 --- /dev/null +++ b/documentation20/Connections with other Tools-ch.md @@ -0,0 +1,149 @@ +# 与其他工具的连接 + + +## Grafana + +TDengine能够与开源数据可视化系统[Grafana](https://www.grafana.com/)快速集成搭建数据监测报警系统,整个过程无需任何代码开发,TDengine中数据表中内容可以在仪表盘(DashBoard)上进行可视化展现。 + +### 安装Grafana + +目前TDengine支持Grafana 5.2.4以上的版本。用户可以根据当前的操作系统,到Grafana官网下载安装包,并执行安装。下载地址如下:https://grafana.com/grafana/download。 + +### 配置Grafana + +TDengine的Grafana插件在安装包的/usr/local/taos/connector/grafana目录下。 + +以CentOS 7.2操作系统为例,将tdengine目录拷贝到/var/lib/grafana/plugins目录下,重新启动grafana即可。 + +### 使用 Grafana + +#### 配置数据源 + +用户可以直接通过 localhost:3000 的网址,登录 Grafana 服务器(用户名/密码:admin/admin),通过左侧 `Configuration -> Data Sources` 可以添加数据源,如下图所示: + +![img](../assets/add_datasource1.jpg) + +点击 `Add data source` 可进入新增数据源页面,在查询框中输入 TDengine 可选择添加,如下图所示: + +![img](../assets/add_datasource2.jpg) + +进入数据源配置页面,按照默认提示修改相应配置即可: + +![img](../assets/add_datasource3.jpg) + +* Host: TDengine 集群的中任意一台服务器的 IP 地址与 TDengine RESTful 接口的端口号(6020),默认 http://localhost:6020。 +* User:TDengine 用户名。 +* Password:TDengine 用户密码。 + +点击 `Save & Test` 进行测试,成功会有如下提示: + +![img](../assets/add_datasource4.jpg) + +#### 创建 Dashboard + +回到主界面创建 Dashboard,点击 Add Query 进入面板查询页面: + +![img](../assets/create_dashboard1.jpg) + +如上图所示,在 Query 中选中 `TDengine` 数据源,在下方查询框可输入相应 sql 进行查询,具体说明如下: + +* INPUT SQL:输入要查询的语句(该 SQL 语句的结果集应为两列多行),例如:`select avg(mem_system) from log.dn where ts >= $from and ts < $to interval($interval)` ,其中,from、to 和 interval 为 TDengine插件的内置变量,表示从Grafana插件面板获取的查询范围和时间间隔。除了内置变量外,`也支持可以使用自定义模板变量`。 +* ALIAS BY:可设置当前查询别名。 +* GENERATE SQL: 点击该按钮会自动替换相应变量,并生成最终执行的语句。 + + +按照默认提示查询当前 TDengine 部署所在服务器指定间隔系统内存平均使用量如下: + +![img](../assets/create_dashboard2.jpg) + +> 关于如何使用Grafana创建相应的监测界面以及更多有关使用Grafana的信息,请参考Grafana官方的[文档](https://grafana.com/docs/)。 + +#### 导入 Dashboard + +在 Grafana 插件目录 /usr/local/taos/connector/grafana/tdengine/dashboard/ 下提供了一个 `tdengine-grafana.json` 可导入的 dashboard。 + +点击左侧 `Import` 按钮,并上传 `tdengine-grafana.json` 文件: + +![img](../assets/import_dashboard1.jpg) + +导入完成之后可看到如下效果: + +![img](../assets/import_dashboard2.jpg) + + +## Matlab + +MatLab可以通过安装包内提供的JDBC Driver直接连接到TDengine获取数据到本地工作空间。 + +### MatLab的JDBC接口适配 + +MatLab的适配有下面几个步骤,下面以Windows10上适配MatLab2017a为例: + +- 将TDengine安装包内的驱动程序JDBCDriver-1.0.0-dist.jar拷贝到${matlab_root}\MATLAB\R2017a\java\jar\toolbox +- 将TDengine安装包内的taos.lib文件拷贝至${matlab_ root _dir}\MATLAB\R2017a\lib\win64 +- 将新添加的驱动jar包加入MatLab的classpath。在${matlab_ root _dir}\MATLAB\R2017a\toolbox\local\classpath.txt文件中添加下面一行 + +​ `$matlabroot/java/jar/toolbox/JDBCDriver-1.0.0-dist.jar` + +- 在${user_home}\AppData\Roaming\MathWorks\MATLAB\R2017a\下添加一个文件javalibrarypath.txt, 并在该文件中添加taos.dll的路径,比如您的taos.dll是在安装时拷贝到了C:\Windows\System32下,那么就应该在javalibrarypath.txt中添加如下一行: + +​ `C:\Windows\System32` + +### 在MatLab中连接TDengine获取数据 + +在成功进行了上述配置后,打开MatLab。 + +- 创建一个连接: + + `conn = database(‘db’, ‘root’, ‘taosdata’, ‘com.taosdata.jdbc.TSDBDriver’, ‘jdbc:TSDB://127.0.0.1:0/’)` + +- 执行一次查询: + + `sql0 = [‘select * from tb’]` + + `data = select(conn, sql0);` + +- 插入一条记录: + + `sql1 = [‘insert into tb values (now, 1)’]` + + `exec(conn, sql1)` + +更多例子细节请参考安装包内examples\Matlab\TDengineDemo.m文件。 + +## R + +R语言支持通过JDBC接口来连接TDengine数据库。首先需要安装R语言的JDBC包。启动R语言环境,然后执行以下命令安装R语言的JDBC支持库: + +```R +install.packages('RJDBC', repos='http://cran.us.r-project.org') +``` + +安装完成以后,通过执行`library('RJDBC')`命令加载 _RJDBC_ 包: + +然后加载TDengine的JDBC驱动: + +```R +drv<-JDBC("com.taosdata.jdbc.TSDBDriver","JDBCDriver-2.0.0-dist.jar", identifier.quote="\"") +``` +如果执行成功,不会出现任何错误信息。之后通过以下命令尝试连接数据库: + +```R +conn<-dbConnect(drv,"jdbc:TSDB://192.168.0.1:0/?user=root&password=taosdata","root","taosdata") +``` + +注意将上述命令中的IP地址替换成正确的IP地址。如果没有任务错误的信息,则连接数据库成功,否则需要根据错误提示调整连接的命令。TDengine支持以下的 _RJDBC_ 包中函数: + + +- dbWriteTable(conn, "test", iris, overwrite=FALSE, append=TRUE):将数据框iris写入表test中,overwrite必须设置为false,append必须设为TRUE,且数据框iris要与表test的结构一致。 +- dbGetQuery(conn, "select count(*) from test"):查询语句 +- dbSendUpdate(conn, "use db"):执行任何非查询sql语句。例如dbSendUpdate(conn, "use db"), 写入数据dbSendUpdate(conn, "insert into t1 values(now, 99)")等。 +- dbReadTable(conn, "test"):读取表test中数据 +- dbDisconnect(conn):关闭连接 +- dbRemoveTable(conn, "test"):删除表test + +TDengine客户端暂不支持如下函数: +- dbExistsTable(conn, "test"):是否存在表test +- dbListTables(conn):显示连接中的所有表 + + diff --git a/documentation20/Connections with other Tools.md b/documentation20/Connections with other Tools.md new file mode 100644 index 0000000000000000000000000000000000000000..8be05698497184aee2c41a60e32f39b636e2070e --- /dev/null +++ b/documentation20/Connections with other Tools.md @@ -0,0 +1,167 @@ +# Connect with other tools + +## Telegraf + +TDengine is easy to integrate with [Telegraf](https://www.influxdata.com/time-series-platform/telegraf/), an open-source server agent for collecting and sending metrics and events, without more development. + +### Install Telegraf + +At present, TDengine supports Telegraf newer than version 1.7.4. Users can go to the [download link] and choose the proper package to install on your system. + +### Configure Telegraf + +Telegraf is configured by changing items in the configuration file */etc/telegraf/telegraf.conf*. + + +In **output plugins** section,add _[[outputs.http]]_ iterm: + +- _url_: http://ip:6020/telegraf/udb, in which _ip_ is the IP address of any node in TDengine cluster. Port 6020 is the RESTful APT port used by TDengine. _udb_ is the name of the database to save data, which needs to create beforehand. +- _method_: "POST" +- _username_: username to login TDengine +- _password_: password to login TDengine +- _data_format_: "json" +- _json_timestamp_units_: "1ms" + +In **agent** part: + +- hostname: used to distinguish different machines. Need to be unique. +- metric_batch_size: 30,the maximum number of records allowed to write in Telegraf. The larger the value is, the less frequent requests are sent. For TDengine, the value should be less than 50. + +Please refer to the [Telegraf docs](https://docs.influxdata.com/telegraf/v1.11/) for more information. + +## Grafana + +[Grafana] is an open-source system for time-series data display. It is easy to integrate TDengine and Grafana to build a monitor system. Data saved in TDengine can be fetched and shown on the Grafana dashboard. + +### Install Grafana + +For now, TDengine only supports Grafana newer than version 5.2.4. Users can go to the [Grafana download page] for the proper package to download. + +### Configure Grafana + +TDengine Grafana plugin is in the _/usr/local/taos/connector/grafana_ directory. +Taking Centos 7.2 as an example, just copy TDengine directory to _/var/lib/grafana/plugins_ directory and restart Grafana. + +### Use Grafana + +Users can log in the Grafana server (username/password:admin/admin) through localhost:3000 to configure TDengine as the data source. As is shown in the picture below, TDengine as a data source option is shown in the box: + + +![img](../assets/clip_image001.png) + +When choosing TDengine as the data source, the Host in HTTP configuration should be configured as the IP address of any node of a TDengine cluster. The port should be set as 6020. For example, when TDengine and Grafana are on the same machine, it should be configured as _http://localhost:6020. + + +Besides, users also should set the username and password used to log into TDengine. Then click _Save&Test_ button to save. + +![img](../assets/clip_image001-2474914.png) + +Then, TDengine as a data source should show in the Grafana data source list. + +![img](../assets/clip_image001-2474939.png) + + +Then, users can create Dashboards in Grafana using TDengine as the data source: + + +![img](../assets/clip_image001-2474961.png) + + + +Click _Add Query_ button to add a query and input the SQL command you want to run in the _INPUT SQL_ text box. The SQL command should expect a two-row, multi-column result, such as _SELECT count(*) FROM sys.cpu WHERE ts>=from and ts<​to interval(interval)_, in which, _from_, _to_ and _inteval_ are TDengine inner variables representing query time range and time interval. + + +_ALIAS BY_ field is to set the query alias. Click _GENERATE SQL_ to send the command to TDengine: + +![img](../assets/clip_image001-2474987.png) + +Please refer to the [Grafana official document] for more information about Grafana. + + +## Matlab + +Matlab can connect to and retrieve data from TDengine by TDengine JDBC Driver. + +### MatLab and TDengine JDBC adaptation + +Several steps are required to adapt Matlab to TDengine. Taking adapting Matlab2017a on Windows10 as an example: + +1. Copy the file _JDBCDriver-1.0.0-dist.jar_ in TDengine package to the directory _${matlab_root}\MATLAB\R2017a\java\jar\toolbox_ +2. Copy the file _taos.lib_ in TDengine package to _${matlab_ root _dir}\MATLAB\R2017a\lib\win64_ +3. Add the .jar package just copied to the Matlab classpath. Append the line below as the end of the file of _${matlab_ root _dir}\MATLAB\R2017a\toolbox\local\classpath.txt_ + +​ `$matlabroot/java/jar/toolbox/JDBCDriver-1.0.0-dist.jar` + +4. Create a file called _javalibrarypath.txt_ in directory _${user_home}\AppData\Roaming\MathWorks\MATLAB\R2017a\_, and add the _taos.dll_ path in the file. For example, if the file _taos.dll_ is in the directory of _C:\Windows\System32_,then add the following line in file *javalibrarypath.txt*: + +​ `C:\Windows\System32` + +### TDengine operations in Matlab + +After correct configuration, open Matlab: + +- build a connection: + + `conn = database(‘db’, ‘root’, ‘taosdata’, ‘com.taosdata.jdbc.TSDBDriver’, ‘jdbc:TSDB://127.0.0.1:0/’)` + +- Query: + + `sql0 = [‘select * from tb’]` + + `data = select(conn, sql0);` + +- Insert a record: + + `sql1 = [‘insert into tb values (now, 1)’]` + + `exec(conn, sql1)` + +Please refer to the file _examples\Matlab\TDengineDemo.m_ for more information. + +## R + +Users can use R language to access the TDengine server with the JDBC interface. At first, install JDBC package in R: + +```R +install.packages('rJDBC', repos='http://cran.us.r-project.org') +``` + +Then use _library_ function to load the package: + +```R +library('RJDBC') +``` + +Then load the TDengine JDBC driver: + +```R +drv<-JDBC("com.taosdata.jdbc.TSDBDriver","JDBCDriver-1.0.0-dist.jar", identifier.quote="\"") +``` +If succeed, no error message will display. Then use the following command to try a database connection: + +```R +conn<-dbConnect(drv,"jdbc:TSDB://192.168.0.1:0/?user=root&password=taosdata","root","taosdata") +``` + +Please replace the IP address in the command above to the correct one. If no error message is shown, then the connection is established successfully. TDengine supports below functions in _RJDBC_ package: + + +- _dbWriteTable(conn, "test", iris, overwrite=FALSE, append=TRUE)_: write the data in a data frame _iris_ to the table _test_ in the TDengine server. Parameter _overwrite_ must be _false_. _append_ must be _TRUE_ and the schema of the data frame _iris_ should be the same as the table _test_. +- _dbGetQuery(conn, "select count(*) from test")_: run a query command +- _dbSendUpdate(conn, "use db")_: run any non-query command. +- _dbReadTable(conn, "test"_): read all the data in table _test_ +- _dbDisconnect(conn)_: close a connection +- _dbRemoveTable(conn, "test")_: remove table _test_ + +Below functions are **not supported** currently: +- _dbExistsTable(conn, "test")_: if talbe _test_ exists +- _dbListTables(conn)_: list all tables in the connection + + +[Telegraf]: www.taosdata.com +[download link]: https://portal.influxdata.com/downloads +[Telegraf document]: www.taosdata.com +[Grafana]: https://grafana.com +[Grafana download page]: https://grafana.com/grafana/download +[Grafana official document]: https://grafana.com/docs/ + diff --git a/documentation20/Connector.md b/documentation20/Connector.md new file mode 100644 index 0000000000000000000000000000000000000000..6d981478dfbad6220961514515e3c1bb51e13c06 --- /dev/null +++ b/documentation20/Connector.md @@ -0,0 +1,823 @@ +# TDengine connectors + +TDengine provides many connectors for development, including C/C++, JAVA, Python, RESTful, Go, Node.JS, etc. + +NOTE: All APIs which require a SQL string as parameter, including but not limit to `taos_query`, `taos_query_a`, `taos_subscribe` in the C/C++ Connector and their counterparts in other connectors, can ONLY process one SQL statement at a time. If more than one SQL statements are provided, their behaviors are undefined. + +## C/C++ API + +C/C++ APIs are similar to the MySQL APIs. Applications should include TDengine head file _taos.h_ to use C/C++ APIs by adding the following line in code: +```C +#include +``` +Make sure TDengine library _libtaos.so_ is installed and use _-ltaos_ option to link the library when compiling. In most cases, if the return value of an API is integer, it return _0_ for success and other values as an error code for failure; if the return value is pointer, then _NULL_ is used for failure. + + +### Fundamental API + +Fundamentatal APIs prepare runtime environment for other APIs, for example, create a database connection. + +- `void taos_init()` + + Initialize the runtime environment for TDengine client. The API is not necessary since it is called int _taos_connect_ by default. + + +- `void taos_cleanup()` + + Cleanup runtime environment, client should call this API before exit. + + +- `int taos_options(TSDB_OPTION option, const void * arg, ...)` + + Set client options. The parameter _option_ supports values of _TSDB_OPTION_CONFIGDIR_ (configuration directory), _TSDB_OPTION_SHELL_ACTIVITY_TIMER_, _TSDB_OPTION_LOCALE_ (client locale) and _TSDB_OPTION_TIMEZONE_ (client timezone). + + +- `char* taos_get_client_info()` + + Retrieve version information of client. + + +- `TAOS *taos_connect(const char *ip, const char *user, const char *pass, const char *db, int port)` + + Open a connection to a TDengine server. The parameters are: + + * ip: IP address of the server + * user: username + * pass: password + * db: database to use, **NULL** for no database to use after connection. Otherwise, the database should exist before connection or a connection error is reported. + * port: port number to connect + + The handle returned by this API should be kept for future use. + + +- `char *taos_get_server_info(TAOS *taos)` + + Retrieve version information of server. + + +- `int taos_select_db(TAOS *taos, const char *db)` + + Set default database to `db`. + + +- `void taos_close(TAOS *taos)` + + Close a connection to a TDengine server by the handle returned by _taos_connect_` + + +### C/C++ sync API + +Sync APIs are those APIs waiting for responses from the server after sending a request. TDengine has the following sync APIs: + +- `TAOS_RES* taos_query(TAOS *taos, const char *sql)` + + The API used to run a SQL command. The command can be DQL, DML or DDL. The parameter _taos_ is the handle returned by _taos_connect_. Return value _NULL_ means failure. + + +- `int taos_result_precision(TAOS_RES *res)` + + Get the timestamp precision of the result set, return value _0_ means milli-second, _1_ mean micro-second and _2_ means nano-second. + + +- `TAOS_ROW taos_fetch_row(TAOS_RES *res)` + + Fetch a row of return results through _res_. + + +- `int taos_fetch_block(TAOS_RES *res, TAOS_ROW *rows)` + + Fetch multiple rows from the result set, return value is row count. + + +- `int taos_num_fields(TAOS_RES *res)` and `int taos_field_count(TAOS_RES* res)` + + These two APIs are identical, both return the number of fields in the return result. + + +- `int* taos_fetch_lengths(TAOS_RES *res)` + + Get the field lengths of the result set, return value is an array whose length is the field count. + + +- `int taos_affected_rows(TAOS_RES *res)` + + Get affected row count of the executed statement. + + +- `TAOS_FIELD *taos_fetch_fields(TAOS_RES *res)` + + Fetch the description of each field. The description includes the property of data type, field name, and bytes. The API should be used with _taos_num_fields_ to fetch a row of data. The structure of `TAOS_FIELD` is: + + ```c + typedef struct taosField { + char name[65]; // field name + uint8_t type; // data type + int16_t bytes; // length of the field in bytes + } TAOS_FIELD; + ``` + + +- `void taos_stop_query(TAOS_RES *res)` + + Stop the execution of a query. + + +- `void taos_free_result(TAOS_RES *res)` + + Free the resources used by a result set. Make sure to call this API after fetching results or memory leak would happen. + + +- `char *taos_errstr(TAOS_RES *res)` + + Return the reason of the last API call failure. The return value is a string. + + +- `int *taos_errno(TAOS_RES *res)` + + Return the error code of the last API call failure. The return value is an integer. + + +**Note**: The connection to a TDengine server is not multi-thread safe. So a connection can only be used by one thread. + + +### C/C++ async API + +In addition to sync APIs, TDengine also provides async APIs, which are more efficient. Async APIs are returned right away without waiting for a response from the server, allowing the application to continute with other tasks without blocking. So async APIs are more efficient, especially useful when in a poor network. + +All async APIs require callback functions. The callback functions have the format: +```C +void fp(void *param, TAOS_RES * res, TYPE param3) +``` +The first two parameters of the callback function are the same for all async APIs. The third parameter is different for different APIs. Generally, the first parameter is the handle provided to the API for action. The second parameter is a result handle. + +- `void taos_query_a(TAOS *taos, const char *sql, void (*fp)(void *param, TAOS_RES *, int code), void *param);` + + The async version of _taos_query_. + + * taos: the handle returned by _taos_connect_. + * sql: the SQL command to run. + * fp: user defined callback function. The third parameter of the callback function _code_ is _0_ (for success) or a negative number (for failure, call taos_errstr to get the error as a string). Applications mainly handle the second parameter, the returned result set. + * param: user provided parameter which is required by the callback function. + + +- `void taos_fetch_rows_a(TAOS_RES *res, void (*fp)(void *param, TAOS_RES *, int numOfRows), void *param);` + + The async API to fetch a batch of rows, which should only be used with a _taos_query_a_ call. + + * res: result handle returned by _taos_query_a_. + * fp: the callback function. _param_ is a user-defined structure to pass to _fp_. The parameter _numOfRows_ is the number of result rows in the current fetch cycle. In the callback function, applications should call _taos_fetch_row_ to get records from the result handle. After getting a batch of results, applications should continue to call _taos_fetch_rows_a_ API to handle the next batch, until the _numOfRows_ is _0_ (for no more data to fetch) or _-1_ (for failure). + + +- `void taos_fetch_row_a(TAOS_RES *res, void (*fp)(void *param, TAOS_RES *, TAOS_ROW row), void *param);` + + The async API to fetch a result row. + + * res: result handle. + * fp: the callback function. _param_ is a user-defined structure to pass to _fp_. The third parameter of the callback function is a single result row, which is different from that of _taos_fetch_rows_a_ API. With this API, it is not necessary to call _taos_fetch_row_ to retrieve each result row, which is handier than _taos_fetch_rows_a_ but less efficient. + + +Applications may apply operations on multiple tables. However, **it is important to make sure the operations on the same table are serialized**. That means after sending an insert request in a table to the server, no operations on the table are allowed before a response is received. + + +### C/C++ parameter binding API + +TDengine also provides parameter binding APIs, like MySQL, only question mark `?` can be used to represent a parameter in these APIs. + +- `TAOS_STMT* taos_stmt_init(TAOS *taos)` + + Create a TAOS_STMT to represent the prepared statement for other APIs. + +- `int taos_stmt_prepare(TAOS_STMT *stmt, const char *sql, unsigned long length)` + + Parse SQL statement _sql_ and bind result to _stmt_ , if _length_ larger than 0, its value is used to determine the length of _sql_, the API auto detects the actual length of _sql_ otherwise. + +- `int taos_stmt_bind_param(TAOS_STMT *stmt, TAOS_BIND *bind)` + + Bind values to parameters. _bind_ points to an array, the element count and sequence of the array must be identical as the parameters of the SQL statement. The usage of _TAOS_BIND_ is same as _MYSQL_BIND_ in MySQL, its definition is as below: + + ```c + typedef struct TAOS_BIND { + int buffer_type; + void * buffer; + unsigned long buffer_length; // not used in TDengine + unsigned long *length; + int * is_null; + int is_unsigned; // not used in TDengine + int * error; // not used in TDengine + } TAOS_BIND; + ``` + +- `int taos_stmt_add_batch(TAOS_STMT *stmt)` + + Add bound parameters to batch, client can call `taos_stmt_bind_param` again after calling this API. Note this API only support _insert_ / _import_ statements, it returns an error in other cases. + +- `int taos_stmt_execute(TAOS_STMT *stmt)` + + Execute the prepared statement. This API can only be called once for a statement at present. + +- `TAOS_RES* taos_stmt_use_result(TAOS_STMT *stmt)` + + Acquire the result set of an executed statement. The usage of the result is same as `taos_use_result`, `taos_free_result` must be called after one you are done with the result set to release resources. + +- `int taos_stmt_close(TAOS_STMT *stmt)` + + Close the statement, release all resources. + + +### C/C++ continuous query interface + +TDengine provides APIs for continuous query driven by time, which run queries periodically in the background. There are only two APIs: + + +- `TAOS_STREAM *taos_open_stream(TAOS *taos, const char *sqlstr, void (*fp)(void *param, TAOS_RES * res, TAOS_ROW row), int64_t stime, void *param, void (*callback)(void *));` + + The API is used to create a continuous query. + * _taos_: the connection handle returned by _taos_connect_. + * _sqlstr_: the SQL string to run. Only query commands are allowed. + * _fp_: the callback function to run after a query. TDengine passes query result `row`, query state `res` and user provided parameter `param` to this function. In this callback, `taos_num_fields` and `taos_fetch_fields` could be used to fetch field information. + * _param_: a parameter passed to _fp_ + * _stime_: the time of the stream starts in the form of epoch milliseconds. If _0_ is given, the start time is set as the current time. + * _callback_: a callback function to run when the continuous query stops automatically. + + The API is expected to return a handle for success. Otherwise, a NULL pointer is returned. + + +- `void taos_close_stream (TAOS_STREAM *tstr)` + + Close the continuous query by the handle returned by _taos_open_stream_. Make sure to call this API when the continuous query is not needed anymore. + + +### C/C++ subscription API + +For the time being, TDengine supports subscription on one or multiple tables. It is implemented through periodic pulling from a TDengine server. + +* `TAOS_SUB *taos_subscribe(TAOS* taos, int restart, const char* topic, const char *sql, TAOS_SUBSCRIBE_CALLBACK fp, void *param, int interval)` + + The API is used to start a subscription session, it returns the subscription object on success and `NULL` in case of failure, the parameters are: + * **taos**: The database connnection, which must be established already. + * **restart**: `Zero` to continue a subscription if it already exits, other value to start from the beginning. + * **topic**: The unique identifier of a subscription. + * **sql**: A sql statement for data query, it can only be a `select` statement, can only query for raw data, and can only query data in ascending order of the timestamp field. + * **fp**: A callback function to receive query result, only used in asynchronization mode and should be `NULL` in synchronization mode, please refer below for its prototype. + * **param**: User provided additional parameter for the callback function. + * **interval**: Pulling interval in millisecond. Under asynchronization mode, API will call the callback function `fp` in this interval, system performance will be impacted if this interval is too short. Under synchronization mode, if the duration between two call to `taos_consume` is less than this interval, the second call blocks until the duration exceed this interval. + +* `typedef void (*TAOS_SUBSCRIBE_CALLBACK)(TAOS_SUB* tsub, TAOS_RES *res, void* param, int code)` + + Prototype of the callback function, the parameters are: + * tsub: The subscription object. + * res: The query result. + * param: User provided additional parameter when calling `taos_subscribe`. + * code: Error code in case of failures. + +* `TAOS_RES *taos_consume(TAOS_SUB *tsub)` + + The API used to get the new data from a TDengine server. It should be put in an loop. The parameter `tsub` is the handle returned by `taos_subscribe`. This API should only be called in synchronization mode. If the duration between two call to `taos_consume` is less than pulling interval, the second call blocks until the duration exceed the interval. The API returns the new rows if new data arrives, or empty rowset otherwise, and if there's an error, it returns `NULL`. + +* `void taos_unsubscribe(TAOS_SUB *tsub, int keepProgress)` + + Stop a subscription session by the handle returned by `taos_subscribe`. If `keepProgress` is **not** zero, the subscription progress information is kept and can be reused in later call to `taos_subscribe`, the information is removed otherwise. + + +## Java Connector + +TDengine 为了方便 Java 应用使用,提供了遵循 JDBC 标准(3.0)API 规范的 `taos-jdbcdriver` 实现。目前可以通过 [Sonatype Repository][1] 搜索并下载。 + +由于 TDengine 是使用 c 语言开发的,使用 taos-jdbcdriver 驱动包时需要依赖系统对应的本地函数库。 + +* libtaos.so + 在 linux 系统中成功安装 TDengine 后,依赖的本地函数库 libtaos.so 文件会被自动拷贝至 /usr/lib/libtaos.so,该目录包含在 Linux 自动扫描路径上,无需单独指定。 + +* taos.dll + 在 windows 系统中安装完客户端之后,驱动包依赖的 taos.dll 文件会自动拷贝到系统默认搜索路径 C:/Windows/System32 下,同样无需要单独指定。 + +> 注意:在 windows 环境开发时需要安装 TDengine 对应的 [windows 客户端][14],Linux 服务器安装完 TDengine 之后默认已安装 client,也可以单独安装 [Linux 客户端][15] 连接远程 TDengine Server。 + +TDengine 的 JDBC 驱动实现尽可能的与关系型数据库驱动保持一致,但时序空间数据库与关系对象型数据库服务的对象和技术特征的差异导致 taos-jdbcdriver 并未完全实现 JDBC 标准规范。在使用时需要注意以下几点: + +* TDengine 不提供针对单条数据记录的删除和修改的操作,驱动中也没有支持相关方法。 +* 由于不支持删除和修改,所以也不支持事务操作。 +* 目前不支持表间的 union 操作。 +* 目前不支持嵌套查询(nested query),对每个 Connection 的实例,至多只能有一个打开的 ResultSet 实例;如果在 ResultSet还没关闭的情况下执行了新的查询,TSDBJDBCDriver 则会自动关闭上一个 ResultSet。 + + +## TAOS-JDBCDriver 版本以及支持的 TDengine 版本和 JDK 版本 + +| taos-jdbcdriver 版本 | TDengine 版本 | JDK 版本 | +| --- | --- | --- | +| 1.0.3 | 1.6.1.x 及以上 | 1.8.x | +| 1.0.2 | 1.6.1.x 及以上 | 1.8.x | +| 1.0.1 | 1.6.1.x 及以上 | 1.8.x | +| 2.0.0 | 2.0.0.x 及以上 | 1.8.x | + +## TDengine DataType 和 Java DataType + +TDengine 目前支持时间戳、数字、字符、布尔类型,与 Java 对应类型转换如下: + +| TDengine DataType | Java DataType | +| --- | --- | +| TIMESTAMP | java.sql.Timestamp | +| INT | java.lang.Integer | +| BIGINT | java.lang.Long | +| FLOAT | java.lang.Float | +| DOUBLE | java.lang.Double | +| SMALLINT, TINYINT |java.lang.Short | +| BOOL | java.lang.Boolean | +| BINARY, NCHAR | java.lang.String | + +## 如何获取 TAOS-JDBCDriver + +### maven 仓库 + +目前 taos-jdbcdriver 已经发布到 [Sonatype Repository][1] 仓库,且各大仓库都已同步。 +* [sonatype][8] +* [mvnrepository][9] +* [maven.aliyun][10] + +maven 项目中使用如下 pom.xml 配置即可: + +```xml + + + com.taosdata.jdbc + taos-jdbcdriver + 2.0.0 + + +``` + +### 源码编译打包 + +下载 [TDengine][3] 源码之后,进入 taos-jdbcdriver 源码目录 `src/connector/jdbc` 执行 `mvn clean package` 即可生成相应 jar 包。 + + +## 使用说明 + +### 获取连接 + +如下所示配置即可获取 TDengine Connection: +```java +Class.forName("com.taosdata.jdbc.TSDBDriver"); +String jdbcUrl = "jdbc:TAOS://127.0.0.1:6030/log?user=root&password=taosdata"; +Connection conn = DriverManager.getConnection(jdbcUrl); +``` +> 端口 6030 为默认连接端口,JDBC URL 中的 log 为系统本身的监控数据库。 + +TDengine 的 JDBC URL 规范格式为: +`jdbc:TSDB://{host_ip}:{port}/[database_name]?[user={user}|&password={password}|&charset={charset}|&cfgdir={config_dir}|&locale={locale}|&timezone={timezone}]` + +其中,`{}` 中的内容必须,`[]` 中为可选。配置参数说明如下: + +* user:登录 TDengine 用户名,默认值 root。 +* password:用户登录密码,默认值 taosdata。 +* charset:客户端使用的字符集,默认值为系统字符集。 +* cfgdir:客户端配置文件目录路径,Linux OS 上默认值 /etc/taos ,Windows OS 上默认值 C:/TDengine/cfg。 +* locale:客户端语言环境,默认值系统当前 locale。 +* timezone:客户端使用的时区,默认值为系统当前时区。 + +以上参数可以在 3 处配置,`优先级由高到低`分别如下: +1. JDBC URL 参数 + 如上所述,可以在 JDBC URL 的参数中指定。 +2. java.sql.DriverManager.getConnection(String jdbcUrl, Properties connProps) +```java +public Connection getConn() throws Exception{ + Class.forName("com.taosdata.jdbc.TSDBDriver"); + String jdbcUrl = "jdbc:TAOS://127.0.0.1:0/log?user=root&password=taosdata"; + Properties connProps = new Properties(); + connProps.setProperty(TSDBDriver.PROPERTY_KEY_USER, "root"); + connProps.setProperty(TSDBDriver.PROPERTY_KEY_PASSWORD, "taosdata"); + connProps.setProperty(TSDBDriver.PROPERTY_KEY_CONFIG_DIR, "/etc/taos"); + connProps.setProperty(TSDBDriver.PROPERTY_KEY_CHARSET, "UTF-8"); + connProps.setProperty(TSDBDriver.PROPERTY_KEY_LOCALE, "en_US.UTF-8"); + connProps.setProperty(TSDBDriver.PROPERTY_KEY_TIME_ZONE, "UTC-8"); + Connection conn = DriverManager.getConnection(jdbcUrl, connProps); + return conn; +} +``` + +3. 客户端配置文件 taos.cfg + + linux 系统默认配置文件为 /var/lib/taos/taos.cfg,windows 系统默认配置文件路径为 C:\TDengine\cfg\taos.cfg。 +```properties +# client default username +# defaultUser root + +# client default password +# defaultPass taosdata + +# default system charset +# charset UTF-8 + +# system locale +# locale en_US.UTF-8 +``` +> 更多详细配置请参考[客户端配置][13] + +### 创建数据库和表 + +```java +Statement stmt = conn.createStatement(); + +// create database +stmt.executeUpdate("create database if not exists db"); + +// use database +stmt.executeUpdate("use db"); + +// create table +stmt.executeUpdate("create table if not exists tb (ts timestamp, temperature int, humidity float)"); +``` +> 注意:如果不使用 `use db` 指定数据库,则后续对表的操作都需要增加数据库名称作为前缀,如 db.tb。 + +### 插入数据 + +```java +// insert data +int affectedRows = stmt.executeUpdate("insert into tb values(now, 23, 10.3) (now + 1s, 20, 9.3)"); + +System.out.println("insert " + affectedRows + " rows."); +``` +> now 为系统内部函数,默认为服务器当前时间。 +> `now + 1s` 代表服务器当前时间往后加 1 秒,数字后面代表时间单位:a(毫秒), s(秒), m(分), h(小时), d(天),w(周), n(月), y(年)。 + +### 查询数据 + +```java +// query data +ResultSet resultSet = stmt.executeQuery("select * from tb"); + +Timestamp ts = null; +int temperature = 0; +float humidity = 0; +while(resultSet.next()){ + + ts = resultSet.getTimestamp(1); + temperature = resultSet.getInt(2); + humidity = resultSet.getFloat("humidity"); + + System.out.printf("%s, %d, %s\n", ts, temperature, humidity); +} +``` +> 查询和操作关系型数据库一致,使用下标获取返回字段内容时从 1 开始,建议使用字段名称获取。 + + +### 关闭资源 + +```java +resultSet.close(); +stmt.close(); +conn.close(); +``` +> `注意务必要将 connection 进行关闭`,否则会出现连接泄露。 +## 与连接池使用 + +**HikariCP** + +* 引入相应 HikariCP maven 依赖: +```xml + + com.zaxxer + HikariCP + 3.4.1 + +``` + +* 使用示例如下: +```java + public static void main(String[] args) throws SQLException { + HikariConfig config = new HikariConfig(); + config.setJdbcUrl("jdbc:TAOS://127.0.0.1:6030/log"); + config.setUsername("root"); + config.setPassword("taosdata"); + + config.setMinimumIdle(3); //minimum number of idle connection + config.setMaximumPoolSize(10); //maximum number of connection in the pool + config.setConnectionTimeout(10000); //maximum wait milliseconds for get connection from pool + config.setIdleTimeout(60000); // max idle time for recycle idle connection + config.setConnectionTestQuery("describe log.dn"); //validation query + config.setValidationTimeout(3000); //validation query timeout + + HikariDataSource ds = new HikariDataSource(config); //create datasource + + Connection connection = ds.getConnection(); // get connection + Statement statement = connection.createStatement(); // get statement + + //query or insert + // ... + + connection.close(); // put back to conneciton pool +} +``` +> 通过 HikariDataSource.getConnection() 获取连接后,使用完成后需要调用 close() 方法,实际上它并不会关闭连接,只是放回连接池中。 +> 更多 HikariCP 使用问题请查看[官方说明][5] + +## Python Connector + +### Install TDengine Python client + +Users can find python client packages in our source code directory _src/connector/python_. There are two directories corresponding two python versions. Please choose the correct package to install. Users can use _pip_ command to install: + +```cmd +pip install src/connector/python/python2/ +``` + +or + +``` +pip install src/connector/python/python3/ +``` + +If _pip_ command is not installed on the system, users can choose to install pip or just copy the _taos_ directory in the python client directory to the application directory to use. + +### Python client interfaces + +To use TDengine Python client, import TDengine module at first: + +```python +import taos +``` + +Users can get module information from Python help interface or refer to our [python code example](). We list the main classes and methods below: + +- _TDengineConnection_ class + + Run `help(taos.TDengineConnection)` in python terminal for details. + +- _TDengineCursor_ class + + Run `help(taos.TDengineCursor)` in python terminal for details. + +- connect method + + Open a connection. Run `help(taos.connect)` in python terminal for details. + +## RESTful Connector + +TDengine also provides RESTful API to satisfy developing on different platforms. Unlike other databases, TDengine RESTful API applies operations to the database through the SQL command in the body of HTTP POST request. What users are required to provide is just a URL. + + +For the time being, TDengine RESTful API uses a _\_ generated from username and password for identification. Safer identification methods will be provided in the future. + + +### HTTP URL encoding + +To use TDengine RESTful API, the URL should have the following encoding format: +``` +http://:/rest/sql +``` +- _ip_: IP address of any node in a TDengine cluster +- _PORT_: TDengine HTTP service port. It is 6020 by default. + +For example, the URL encoding _http://192.168.0.1:6020/rest/sql_ used to send HTTP request to a TDengine server with IP address as 192.168.0.1. + +It is required to add a token in an HTTP request header for identification. + +``` +Authorization: Basic +``` + +The HTTP request body contains the SQL command to run. If the SQL command contains a table name, it should also provide the database name it belongs to in the form of `.`. Otherwise, an error code is returned. + +For example, use _curl_ command to send a HTTP request: + +``` +curl -H 'Authorization: Basic ' -d '' :/rest/sql +``` + +or use + +``` +curl -u username:password -d '' :/rest/sql +``` + +where `TOKEN` is the encryted string of `{username}:{password}` using the Base64 algorithm, e.g. `root:taosdata` will be encoded as `cm9vdDp0YW9zZGF0YQ==` + +### HTTP response + +The HTTP resonse is in JSON format as below: + +``` +{ + "status": "succ", + "head": ["column1","column2", …], + "data": [ + ["2017-12-12 23:44:25.730", 1], + ["2017-12-12 22:44:25.728", 4] + ], + "rows": 2 +} +``` +Specifically, +- _status_: the result of the operation, success or failure +- _head_: description of returned result columns +- _data_: the returned data array. If no data is returned, only an _affected_rows_ field is listed +- _rows_: the number of rows returned + +### Example + +- Use _curl_ command to query all the data in table _t1_ of database _demo_: + + `curl -H 'Authorization: Basic cm9vdDp0YW9zZGF0YQ==' -d 'select * from demo.t1' 192.168.0.1:6020/rest/sql` + +The return value is like: + +``` +{ + "status": "succ", + "head": ["column1","column2","column3"], + "data": [ + ["2017-12-12 23:44:25.730", 1, 2.3], + ["2017-12-12 22:44:25.728", 4, 5.6] + ], + "rows": 2 +} +``` + +- Use HTTP to create a database: + + `curl -H 'Authorization: Basic cm9vdDp0YW9zZGF0YQ==' -d 'create database demo' 192.168.0.1:6020/rest/sql` + + The return value should be: + +``` +{ + "status": "succ", + "head": ["affected_rows"], + "data": [[1]], + "rows": 1, +} +``` + +## Go Connector + +TDengine provides a GO client package `taosSql`. `taosSql` implements a kind of interface of GO `database/sql/driver`. User can access TDengine by importing the package in their program with the following instructions, detailed usage please refer to `https://github.com/taosdata/driver-go/blob/develop/taosSql/driver_test.go` + +```Go +import ( + "database/sql" + _ github.com/taosdata/driver-go/taoSql“ +) +``` +### API + +* `sql.Open(DRIVER_NAME string, dataSourceName string) *DB` + + Open DB, generally DRIVER_NAME will be used as a constant with default value `taosSql`, dataSourceName is a combined String with format `user:password@/tcp(host:port)/dbname`. If user wants to access TDengine with multiple goroutine concurrently, the better way is to create an sql.Open object in each goroutine to access TDengine. + + **Note**: When calling this api, only a few initial work are done, instead the validity check happened during executing `Query` or `Exec`, at this time the connection will be created, and system will check if `user、password、host、port` is valid. Additionaly the most of features are implemented in the taosSql dependency lib `libtaos`, from this view, sql.Open is lightweight. + +* `func (db *DB) Exec(query string, args ...interface{}) (Result, error)` + + Execute non-Query related SQLs, the execution result is stored with type of Result. + + +* `func (db *DB) Query(query string, args ...interface{}) (*Rows, error)` + + Execute Query related SQLs, the execution result is *Raw, the detailed usage can refer GO interface `database/sql/driver` + +## Node.js Connector + +TDengine also provides a node.js connector package that is installable through [npm](https://www.npmjs.com/). The package is also in our source code at *src/connector/nodejs/*. The following instructions are also available [here](https://github.com/taosdata/tdengine/tree/master/src/connector/nodejs) + +To get started, just type in the following to install the connector through [npm](https://www.npmjs.com/). + +```cmd +npm install td-connector +``` + +It is highly suggested you use npm. If you don't have it installed, you can also just copy the nodejs folder from *src/connector/nodejs/* into your node project folder. + +To interact with TDengine, we make use of the [node-gyp](https://github.com/nodejs/node-gyp) library. To install, you will need to install the following depending on platform (the following instructions are quoted from node-gyp) + +### On Unix + +- `python` (`v2.7` recommended, `v3.x.x` is **not** supported) +- `make` +- A proper C/C++ compiler toolchain, like [GCC](https://gcc.gnu.org) + +### On macOS + +- `python` (`v2.7` recommended, `v3.x.x` is **not** supported) (already installed on macOS) + +- Xcode + + - You also need to install the + + ``` + Command Line Tools + ``` + + via Xcode. You can find this under the menu + + ``` + Xcode -> Preferences -> Locations + ``` + + (or by running + + ``` + xcode-select --install + ``` + + in your Terminal) + + - This step will install `gcc` and the related toolchain containing `make` + +### On Windows + +#### Option 1 + +Install all the required tools and configurations using Microsoft's [windows-build-tools](https://github.com/felixrieseberg/windows-build-tools) using `npm install --global --production windows-build-tools` from an elevated PowerShell or CMD.exe (run as Administrator). + +#### Option 2 + +Install tools and configuration manually: + +- Install Visual C++ Build Environment: [Visual Studio Build Tools](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools) (using "Visual C++ build tools" workload) or [Visual Studio 2017 Community](https://visualstudio.microsoft.com/pl/thank-you-downloading-visual-studio/?sku=Community) (using the "Desktop development with C++" workload) +- Install [Python 2.7](https://www.python.org/downloads/) (`v3.x.x` is not supported), and run `npm config set python python2.7` (or see below for further instructions on specifying the proper Python version and path.) +- Launch cmd, `npm config set msvs_version 2017` + +If the above steps didn't work for you, please visit [Microsoft's Node.js Guidelines for Windows](https://github.com/Microsoft/nodejs-guidelines/blob/master/windows-environment.md#compiling-native-addon-modules) for additional tips. + +To target native ARM64 Node.js on Windows 10 on ARM, add the components "Visual C++ compilers and libraries for ARM64" and "Visual C++ ATL for ARM64". + +### Usage + +The following is a short summary of the basic usage of the connector, the full api and documentation can be found [here](http://docs.taosdata.com/node) + +#### Connection + +To use the connector, first require the library ```td-connector```. Running the function ```taos.connect``` with the connection options passed in as an object will return a TDengine connection object. The required connection option is ```host```, other options if not set, will be the default values as shown below. + +A cursor also needs to be initialized in order to interact with TDengine from Node.js. + +```javascript +const taos = require('td-connector'); +var conn = taos.connect({host:"127.0.0.1", user:"root", password:"taosdata", config:"/etc/taos",port:0}) +var cursor = conn.cursor(); // Initializing a new cursor +``` + +To close a connection, run + +```javascript +conn.close(); +``` + +#### Queries + +We can now start executing simple queries through the ```cursor.query``` function, which returns a TaosQuery object. + +```javascript +var query = cursor.query('show databases;') +``` + +We can get the results of the queries through the ```query.execute()``` function, which returns a promise that resolves with a TaosResult object, which contains the raw data and additional functionalities such as pretty printing the results. + +```javascript +var promise = query.execute(); +promise.then(function(result) { + result.pretty(); //logs the results to the console as if you were in the taos shell +}); +``` + +You can also query by binding parameters to a query by filling in the question marks in a string as so. The query will automatically parse what was binded and convert it to the proper format for use with TDengine + +```javascript +var query = cursor.query('select * from meterinfo.meters where ts <= ? and areaid = ?;').bind(new Date(), 5); +query.execute().then(function(result) { + result.pretty(); +}) +``` + +The TaosQuery object can also be immediately executed upon creation by passing true as the second argument, returning a promise instead of a TaosQuery. + +```javascript +var promise = cursor.query('select * from meterinfo.meters where v1 = 30;', true) +promise.then(function(result) { + result.pretty(); +}) +``` +#### Async functionality + +Async queries can be performed using the same functions such as `cursor.execute`, `cursor.query`, but now with `_a` appended to them. + +Say you want to execute an two async query on two seperate tables, using `cursor.query_a`, you can do that and get a TaosQuery object, which upon executing with the `execute_a` function, returns a promise that resolves with a TaosResult object. + +```javascript +var promise1 = cursor.query_a('select count(*), avg(v1), avg(v2) from meter1;').execute_a() +var promise2 = cursor.query_a('select count(*), avg(v1), avg(v2) from meter2;').execute_a(); +promise1.then(function(result) { + result.pretty(); +}) +promise2.then(function(result) { + result.pretty(); +}) +``` + + +### Example + +An example of using the NodeJS connector to create a table with weather data and create and execute queries can be found [here](https://github.com/taosdata/TDengine/tree/master/tests/examples/nodejs/node-example.js) (The preferred method for using the connector) + +An example of using the NodeJS connector to achieve the same things but without all the object wrappers that wrap around the data returned to achieve higher functionality can be found [here](https://github.com/taosdata/TDengine/tree/master/tests/examples/nodejs/node-example-raw.js) + diff --git a/documentation20/Contributor_License_Agreement.md b/documentation20/Contributor_License_Agreement.md new file mode 100644 index 0000000000000000000000000000000000000000..8c158da4c5958384064b9993de6643be86b94fee --- /dev/null +++ b/documentation20/Contributor_License_Agreement.md @@ -0,0 +1,35 @@ +# TaosData Contributor License Agreement + +This TaosData Contributor License Agreement (CLA) applies to any contribution you make to any TaosData projects. If you are representing your employing organization to sign this agreement, please warrant that you have the authority to grant the agreement. + +## Terms + +**"TaosData"**, **"we"**, **"our"** and **"us"** means TaosData, inc. + +**"You"** and **"your"** means you or the organization you are on behalf of to sign this agreement. + +**"Contribution"** means any original work you, or the organization you represent submit to TaosData for any project in any manner. + +## Copyright License + +All rights of your Contribution submitted to TaosData in any manner are granted to TaosData and recipients of software distributed by TaosData. You waive any rights that my affect our ownership of the copyright and grant to us a perpetual, worldwide, transferable, non-exclusive, no-charge, royalty-free, irrevocable, and sublicensable license to use, reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute Contributions and any derivative work created based on a Contribution. + +## Patent License + +With respect to any patents you own or that you can license without payment to any third party, you grant to us and to any recipient of software distributed by us, a perpetual, worldwide, transferable, non-exclusive, no-charge, royalty-free, irrevocable patent license to make, have make, use, sell, offer to sell, import, and otherwise transfer the Contribution in whole or in part, alone or included in any product under any patent you own, or license from a third party, that is necessarily infringed by the Contribution or by combination of the Contribution with any Work. + +## Your Representations and Warranties + +You represent and warrant that: + +- the Contribution you submit is an original work that you can legally grant the rights set out in this agreement. + +- the Contribution you submit and licenses you granted does not and will not, infringe the rights of any third party. + +- you are not aware of any pending or threatened claims, suits, actions, or charges pertaining to the contributions. You also warrant to notify TaosData immediately if you become aware of any such actual or potential claims, suits, actions, allegations or charges. + +## Support + +You are not obligated to support your Contribution except you volunteer to provide support. If you want, you can provide for a fee. + +**I agree and accept on behalf of myself and behalf of my organization:** \ No newline at end of file diff --git a/documentation20/Documentation-ch.md b/documentation20/Documentation-ch.md new file mode 100644 index 0000000000000000000000000000000000000000..b6dbad07dc15a93f0cef3c47d48ef0a763eac531 --- /dev/null +++ b/documentation20/Documentation-ch.md @@ -0,0 +1,117 @@ +#TDengine文档 + +TDengine是一个高效的存储、查询、分析时序大数据的平台,专为物联网、车联网、工业互联网、运维监测等优化而设计。您可以像使用关系型数据库MySQL一样来使用它,但建议您在使用前仔细阅读一遍下面的文档,特别是[数据模型](data-model-and-architecture)与数据建模一节。除本文档之外,欢迎[下载产品白皮书](https://www.taosdata.com/downloads/TDengine%20White%20Paper.pdf)。 + +##TDengine 介绍 +- TDengine 简介及特色 +- TDengine 适用场景 +- TDengine 性能指标介绍和验证方法 + +##立即开始 +- 快捷安装:可通过源码、安装包或docker安装,三秒钟搞定 +- 轻松启动:使用systemctl 启停TDengine +- 命令行程序TAOS:访问TDengine的简便方式 +- [极速体验](https://www.taosdata.com/cn/getting-started/#TDengine-极速体验):运行示例程序,快速体验高效的数据插入、查询 + +##数据模型和整体架构 +- 数据模型:关系型数据库模型,但要求每个采集点单独建表 +- 集群与基本逻辑单元:吸取NoSQL优点,支持水平扩展,支持高可靠 +- 存储模型与数据分区:标签数据与时序数据完全分离,按vnode和时间两个维度对数据切分 +- 数据写入与复制流程:先写入WAL、之后写入缓存,再给应用确认,支持多副本 +- 缓存与持久化:最新数据缓存在内存中,但落盘时采用列式存储、超高压缩比 +- 高效查询:支持各种函数、时间轴聚合、插值、多表聚合 + +##数据建模 +- 创建库:为具有相似数据特征的数据采集点创建一个库 +- 创建超级表:为同一类型的数据采集点创建一个超级表 +- 创建表:使用超级表做模板,为每一个具体的数据采集点单独建表 + +##高效写入数据 +- SQL写入:使用SQL insert命令向一张或多张表写入单条或多条记录 +- Telegraf 写入:配置Telegraf, 不用任何代码,将采集数据直接写入 +- Prometheus写入:配置Prometheus, 不用任何代码,将数据直接写入 +- EMQ X Broker:配置EMQ X,不用任何代码,就可将MQTT数据直接写入 + +##高效查询数据 +- 主要查询功能:支持各种标准函数,设置过滤条件,时间段查询 +- 多表聚合查询:使用超级表,设置标签过滤条件,进行高效聚合查询 +- 降采样查询:按时间段分段聚合,支持插值 + +##高级功能 +- 连续查询(Continuous Query):基于滑动窗口,定时自动的对数据流进行查询计算 +- 数据订阅(Publisher/Subscriber):象典型的消息队列,应用可订阅接收到的最新数据 +- [缓存 (Cache)](https://www.taosdata.com/cn/documentation/advanced-features/#缓存-(Cache)):每个设备最新的数据都会缓存在内存中,可快速获取 +- [报警监测(Alarm monitoring)](https://www.taosdata.com/blog/2020/04/14/1438.html/):根据配置规则,自动监测超限行为数据,并主动推送 + +##连接器 +- C/C++ Connector:通过libtaos客户端的库,连接TDengine服务器的主要方法 +- Java Connector(JDBC):通过标准的JDBC API,给Java应用提供到TDengine的连接 +- Python Connector:给Python应用提供一个连接TDengine服务器的驱动 +- RESTful Connector:提供一最简单的连接TDengine服务器的方式 +- Go Connector:给Go应用提供一个连接TDengine服务器的驱动 +- Node.js Connector:给node应用提供一个链接TDengine服务器的驱动 + +##与其他工具的连接 +- Grafana:获取并可视化保存在TDengine的数据 +- Matlab:通过配置Matlab的JDBC数据源访问保存在TDengine的数据 +- R:通过配置R的JDBC数据源访问保存在TDengine的数据 + +## TDengine集群的安装、管理 + +- 安装:与单节点的安装一样,但要设好配置文件里的参数first +- 节点管理:增加、删除、查看集群的节点 +- mnode的管理:系统自动创建、无需任何人工干预 +- 负载均衡:一旦节点个数或负载有变化,自动进行 +- 节点离线处理:节点离线超过一定时长,将从集群中剔除 +- Arbitrator:对于偶数个副本的情形,使用它可以防止split brain。 + +##TDengine的运营和维护 + +- 容量规划:根据场景,估算硬件资源 +- 容错和灾备:设置正确的WAL和数据副本数 +- 系统配置:端口,缓存大小,文件块大小和其他系统配置 +- 用户管理:添加、删除TDengine用户,修改用户密码 +- 数据导入:可按脚本文件导入,也可按数据文件导入 +- 数据导出:从shell按表导出,也可用taosdump工具做各种导出 +- 系统监控:检查系统现有的连接、查询、流式计算,日志和事件等 +- 文件目录结构:TDengine数据文件、配置文件等所在目录 Hui Li + +##TAOS SQL +- 支持的数据类型:支持时间戳、整型、浮点型、布尔型、字符型等多种数据类型 +- 数据库管理:添加、删除、查看数据库 +- 表管理:添加、删除、查看、修改表 +- 超级表管理:添加、删除、查看、修改超级表 +- 标签管理:增加、删除、修改标签 +- 数据写入:支持单表单条、多条、多表多条写入,支持历史数据写入 +- 数据查询:支持时间段、值过滤、排序、查询结果手动分页等 +- SQL函数:支持各种聚合函数、选择函数、计算函数,如avg, min, diff等 +- 时间维度聚合:将表中数据按照时间段进行切割后聚合,降维处理 + +##TDengine的技术设计 +- 系统模块:taosd的功能和模块划分 +- 技术博客:更多的技术分析和架构设计文章 + +## 常用工具 + +- [TDengine样例数据导入工具](https://www.taosdata.com/cn/documentation/blog/2020/01/18/如何快速验证性能和主要功能?tdengine样例数据导入工/) +- [TDengine性能对比测试工具](https://www.taosdata.com/cn/documentation/blog/2020/01/13/用influxdb开源的性能测试工具对比influxdb和tdengine/) + +##TDengine与其他数据库的对比测试 + +- [用InfluxDB开源的性能测试工具对比InfluxDB和TDengine](https://www.taosdata.com/cn/documentation/blog/2020/01/13/用influxdb开源的性能测试工具对比influxdb和tdengine/) +- [TDengine与OpenTSDB对比测试](https://www.taosdata.com/cn/documentation/blog/2019/08/21/tdengine与opentsdb对比测试/) +- [TDengine与Cassandra对比测试](https://www.taosdata.com/cn/documentation/blog/2019/08/14/tdengine与cassandra对比测试/) +- [TDengine与InfluxDB对比测试](https://www.taosdata.com/cn/documentation/blog/2019/07/19/tdengine与influxdb对比测试/) +- [TDengine与InfluxDB、OpenTSDB、Cassandra、MySQL、ClickHouse等数据库的对比测试报告](https://www.taosdata.com/downloads/TDengine_Testing_Report_cn.pdf) + +##物联网大数据 +- [物联网、工业互联网大数据的特点](https://www.taosdata.com/blog/2019/07/09/物联网、工业互联网大数据的特点/) +- [物联网大数据平台应具备的功能和特点](https://www.taosdata.com/blog/2019/07/29/物联网大数据平台应具备的功能和特点/) +- [通用大数据架构为什么不适合处理物联网数据?](https://www.taosdata.com/blog/2019/07/09/通用互联网大数据处理架构为什么不适合处理物联/) +- [物联网、车联网、工业互联网大数据平台,为什么推荐使用TDengine?](https://www.taosdata.com/blog/2019/07/09/物联网、车联网、工业互联网大数据平台,为什么/) + +##培训和FAQ +- FAQ:常见问题与答案 +- 应用案列:一些使用实例来解释如何使用TDengine + + \ No newline at end of file diff --git a/documentation20/Documentation.md b/documentation20/Documentation.md new file mode 100644 index 0000000000000000000000000000000000000000..bdafd40f7c76425a4f9734a2561b2b9a945c757f --- /dev/null +++ b/documentation20/Documentation.md @@ -0,0 +1,87 @@ +#Documentation + +TDengine is a highly efficient platform to store, query, and analyze time-series data. It works like a relational database, but you are strongly suggested to read through the following documentation before you experience it. + +##Getting Started + +- Quick Start: download, install and experience TDengine in a few seconds +- TDengine Shell: command-line interface to access TDengine server +- Major Features: insert/query, aggregation, cache, pub/sub, continuous query + +## Data Model and Architecture + +- Data Model: relational database model, but one table for one device with static tags +- Architecture: Management Module, Data Module, Client Module +- Writing Process: records recieved are written to WAL, cache, then ack is sent back to client +- Data Storage: records are sharded in the time range, and stored column by column + +##TAOS SQL + +- Data Types: support timestamp, int, float, double, binary, nchar, bool, and other types +- Database Management: add, drop, check databases +- Table Management: add, drop, check, alter tables +- Inserting Records: insert one or more records into tables, historical records can be imported +- Data Query: query data with time range and filter conditions, support limit/offset +- SQL Functions: support aggregation, selector, transformation functions +- Downsampling: aggregate data in successive time windows, support interpolation + +##STable: Super Table + +- What is a Super Table: an innovated way to aggregate tables +- Create a STable: it is like creating a standard table, but with tags defined +- Create a Table via STable: use STable as the template, with tags specified +- Aggregate Tables via STable: group tables together by specifying the tags filter condition +- Create Table Automatically: create tables automatically with a STable as a template +- Management of STables: create/delete/alter super table just like standard tables +- Management of Tags: add/delete/alter tags on super tables or tables + +##Advanced Features + +- Continuous Query: query executed by TDengine periodically with a sliding window +- Publisher/Subscriber: subscribe to the newly arrived data like a typical messaging system +- Caching: the newly arrived data of each device/table will always be cached + +##Connector + +- C/C++ Connector: primary method to connect to the server through libtaos client library +- Java Connector: driver for connecting to the server from Java applications using the JDBC API +- Python Connector: driver for connecting to the server from Python applications +- RESTful Connector: a simple way to interact with TDengine via HTTP +- Go Connector: driver for connecting to the server from Go applications +- Node.js Connector: driver for connecting to the server from node applications + +##Connections with Other Tools + +- Telegraf: pass the collected DevOps metrics to TDengine +- Grafana: query the data saved in TDengine and visualize them +- Matlab: access TDengine server from Matlab via JDBC +- R: access TDengine server from R via JDBC + +##Administrator + +- Directory and Files: files and directories related with TDengine +- Configuration on Server: customize IP port, cache size, file block size and other settings +- Configuration on Client: customize locale, default user and others +- User Management: add/delete users, change passwords +- Import Data: import data into TDengine from either script or CSV file +- Export Data: export data either from TDengine shell or from tool taosdump +- Management of Connections, Streams, Queries: check or kill the connections, queries +- System Monitor: collect the system metric, and log important operations + +##More on System Architecture + +- Storage Design: column-based storage with optimization on time-series data +- Query Design: an efficient way to query time-series data +- Technical blogs to delve into the inside of TDengine + +## More on IoT Big Data + +- [Characteristics of IoT Big Data](https://www.taosdata.com/blog/2019/07/09/characteristics-of-iot-big-data/) +- [Why don’t General Big Data Platforms Fit IoT Scenarios?](https://www.taosdata.com/blog/2019/07/09/why-does-the-general-big-data-platform-not-fit-iot-data-processing/) +- [Why TDengine is the Best Choice for IoT Big Data Processing?](https://www.taosdata.com/blog/2019/07/09/why-tdengine-is-the-best-choice-for-iot-big-data-processing/) + +##Tutorials & FAQ + +- FAQ: a list of frequently asked questions and answers +- Use cases: a few typical cases to explain how to use TDengine in IoT platform + diff --git a/documentation20/Evaluation-ch.md b/documentation20/Evaluation-ch.md new file mode 100644 index 0000000000000000000000000000000000000000..aa4ab14af8b4f03cb3af0187ad35ee3716d87f58 --- /dev/null +++ b/documentation20/Evaluation-ch.md @@ -0,0 +1,64 @@ +# TDengine 适用场景介绍(草案) + +## TDengine 简介 + + + +TDengine是涛思数据面对高速增长的物联网大数据市场和技术挑战推出的创新性的大数据处理产品,它不依赖任何第三方软件,也不是优化或包装了一个开源的数据库或流式计算产品,而是在吸取众多传统关系型数据库、NoSQL数据库、流式计算引擎、消息队列等软件的优点之后自主开发的产品,在时序空间大数据处理上,有着自己独到的优势。 + +* __10倍以上的性能提升__:定义了创新的数据存储结构,单核每秒就能处理至少2万次请求,插入数百万个数据点,读出一千万以上数据点,比现有通用数据库快了十倍以上。 +* __硬件或云服务成本降至1/5__:由于超强性能,计算资源不到通用大数据方案的1/5;通过列式存储和先进的压缩算法,存储空间不到通用数据库的1/10 +* __全栈时序数据处理引擎__:将数据库、消息队列、缓存、流式计算等功能融合一起,应用无需再集成Kafka/Redis/HBase/Spark/HDFS等软件,大幅降低应用开发和维护的复杂度成本。 +* __强大的分析功能__:无论是十年前还是一秒钟前的数据,指定时间范围即可查询。数据可在时间轴上或多个设备上进行聚合。临时查询可通过Shell, Python, R, Matlab随时进行。 +* __与第三方工具无缝连接__:不用一行代码,即可与Telegraf, Grafana, EMQ, Prometheus, Matlab, R等集成。后续将支持OPC, Hadoop, Spark等, BI工具也将无缝连接。 +* __零运维成本、零学习成本__:安装、集群一秒搞定,无需分库分表,实时备份。标准SQL,支持JDBC, RESTful, 支持Python/Java/C/C++/Go, 与MySQL相似,零学习成本。 + + + + + +## TDengine 总体适用场景 + +作为一个IOT大数据平台,TDengine的典型适用场景是在IOT范畴,而且用户有一定的数据量。本文后续的介绍主要针对这个范畴里面的系统。范畴之外的系统,比如CRM,ERP等,不在本文讨论范围内。 + + +## 数据源特点和需求 +从数据源角度,设计人员可以从已经角度分析TDengine在目标应用系统里面的适用性。 + +|数据源特点和需求|不适用|可能适用|非常适用|简单说明| +|---|---|---|---|---| +|总体数据量巨大| | | ✅ |TDengine在容量方面提供出色的水平扩展功能,并且具备匹配高压缩的存储结构,达到业界最优的存储效率。| +|数据输入速度偶尔或者持续巨大| | | ✅ | TDengine的性能大大超过同类产品,可以在同样的硬件环境下持续处理大量的输入数据,并且提供很容易在用户环境里面运行的性能评估工具。| +|数据源数目巨大| | | ✅ |TDengine设计中包含专门针对大量数据源的优化,包括数据的写入和查询,尤其适合高效处理海量(千万或者更多量级)的数据源。| + + + +## 系统架构要求 +|系统架构要求|不适用|可能适用|非常适用|简单说明| +|---|---|---|---|---| +|要求简单可靠的系统架构| | | ✅ |TDengine的系统架构非常简单可靠,自带消息队列,缓存,流式计算,监控等功能,无需集成额外的第三方产品。| +|要求容错和高可靠| | | ✅ |TDengine的集群功能,自动提供容错灾备等高可靠功能| +|标准化规范| | | ✅ |TDengine使用标准的SQL语言提供主要功能,遵守标准化规范| + +## 系统功能需求 +|系统功能需求|不适用|可能适用|非常适用|简单说明| +|---|---|---|---|---| +|要求完整的内置数据处理算法| | ✅ | |TDengine的实现了通用的数据处理算法,但是还没有做到妥善处理各行各业的所有要求,因此特殊类型的处理还需要应用层面处理。| +|需要大量的交叉查询处理| | ✅ | |这种类型的处理更多应该用关系型数据系统处理,或者应该考虑TDengine和关系型数据系统配合实现系统功能| + + +## 系统性能需求 +|系统性能需求|不适用|可能适用|非常适用|简单说明| +|---|---|---|---|---| +|要求较大的总体处理能力| | | ✅ |TDengine的集群功能可以轻松地让多服务器配合达成处理能力的提升。| +|要求高速处理数据 | | | ✅ |TDengine的专门为IOT优化的存储和数据处理的设计,一般可以让系统得到超出同类产品多倍数的处理速度提升。| +|要求快速处理小粒度数据| | | ✅ |这方面TDengine性能可以完全对标关系型和NoSQL型数据处理系统。| + + +## 系统维护需求 +|系统维护需求|不适用|可能适用|非常适用|简单说明| +|---|---|---|---|---| +|要求系统可靠运行| | | ✅ |TDengine的系统架构非常稳定可靠,日常维护也简单便捷,对维护人员的要求简洁明了,最大程度上杜绝人为错误和事故。| +|要求运维学习成本可控| | | ✅ |同上| +|要求市场有大量人才储备| ✅ | | |TDengine作为新一代产品,目前人才市场里面有经验的人员还有限。但是学习成本低,我们作为厂家也提供运维的培训和辅助服务| + diff --git a/documentation20/Getting Started-ch.md b/documentation20/Getting Started-ch.md new file mode 100644 index 0000000000000000000000000000000000000000..4b109aa8e6d03804ee95d0ab29b7628778187a56 --- /dev/null +++ b/documentation20/Getting Started-ch.md @@ -0,0 +1,169 @@ +# 立即开始 + +## 快捷安装 + +TDengine软件分为服务器、客户端和报警模块三部分,目前2.0版仅能在Linux系统上安装和运行,后续会支持Windows、MAC OS等系统。如果应用需要在Windows或Mac上运行,目前只能使用TDengine的RESTful接口连接服务器。硬件支持X64,后续会支持ARM、龙芯等CPU系统。用户可根据需求选择通过[源码](https://www.taosdata.com/cn/getting-started/#通过源码安装)或者[安装包](https://www.taosdata.com/cn/getting-started/#通过安装包安装)来安装。 + +### 通过源码安装 + +请参考我们的[TDengine github主页](https://github.com/taosdata/TDengine)下载源码并安装. + +### 通过Docker容器运行 + +请参考[TDengine官方Docker镜像的发布、下载和使用](https://www.taosdata.com/blog/2020/05/13/1509.html) + +### 通过安装包安装 + +服务器部分,我们提供三种安装包,您可以根据需要选择。TDengine的安装非常简单,从下载到安装成功仅仅只要几秒钟。 + + + +客户端部分,Linux安装包如下: + +- TDengine-client-2.0.0.0-Linux-x64.tar.gz (3.4M) + +报警模块的Linux安装包如下(请参考[报警模块的使用方法](https://github.com/taosdata/TDengine/blob/master/alert/README_cn.md)): + +- TDengine-alert-2.0.0-Linux-x64.tar.gz (8.1M) + +目前,TDengine只支持在使用[`systemd`](https://en.wikipedia.org/wiki/Systemd)做进程服务管理的linux系统上安装。其他linux系统的支持正在开发中。用`which`命令来检测系统中是否存在`systemd`: + +```cmd +which systemd +``` + +如果系统中不存在`systemd`命令,请考虑[通过源码安装](#通过源码安装)TDengine。 + +具体的安装过程,请参见[`TDengine多种安装包的安装和卸载`](https://www.taosdata.com/blog/2019/08/09/566.html) + +## 轻松启动 + +安装成功后,用户可使用`systemctl`命令来启动TDengine的服务进程。 + +```cmd +systemctl start taosd +``` + +检查服务是否正常工作。 +```cmd +systemctl status taosd +``` + +如果TDengine服务正常工作,那么您可以通过TDengine的命令行程序`taos`来访问并体验TDengine。 + +**注:_systemctl_ 命令需要 _root_ 权限来运行,如果您非 _root_ 用户,请在命令前添加 _sudo_** + +## TDengine命令行程序 + +执行TDengine命令行程序,您只要在Linux终端执行`taos`即可 + +```cmd +taos +``` + +如果TDengine终端链接服务成功,将会打印出欢迎消息和版本信息。如果失败,则会打印错误消息出来(请参考[FAQ](https://www.taosdata.com/cn/faq/)来解决终端链接服务端失败的问题)。TDengine终端的提示符号如下: + +```cmd +taos> +``` + +在TDengine终端中,用户可以通过SQL命令来创建/删除数据库、表等,并进行插入查询操作。在终端中运行的SQL语句需要以分号结束来运行。示例: + +```mysql +create database db; +use db; +create table t (ts timestamp, cdata int); +insert into t values ('2019-07-15 00:00:00', 10); +insert into t values ('2019-07-15 01:00:00', 20); +select * from t; + ts | speed | +=================================== + 19-07-15 00:00:00.000| 10| + 19-07-15 01:00:00.000| 20| +Query OK, 2 row(s) in set (0.001700s) +``` + +除执行SQL语句外,系统管理员还可以从TDengine终端检查系统运行状态,添加删除用户账号等。 + +### 命令行参数 + +您可通过配置命令行参数来改变TDengine终端的行为。以下为常用的几个命令行参数: + +- -c, --config-dir: 指定配置文件目录,默认为_/etc/taos_ +- -h, --host: 指定服务的IP地址,默认为本地服务 +- -s, --commands: 在不进入终端的情况下运行TDengine命令 +- -u, -- user: 链接TDengine服务器的用户名,缺省为root +- -p, --password: 链接TDengine服务器的密码,缺省为taosdata +- -?, --help: 打印出所有命令行参数 + +示例: + +```cmd +taos -h 192.168.0.1 -s "use db; show tables;" +``` + +### 运行SQL命令脚本 + +TDengine终端可以通过`source`命令来运行SQL命令脚本. + +``` +taos> source ; +``` + +### Shell小技巧 + +- 可以使用上下光标键查看已经历史输入的命令 +- 修改用户密码。在shell中使用alter user命令 +- ctrl+c 中止正在进行中的查询 +- 执行`RESET QUERY CACHE`清空本地缓存的表的schema + +## TDengine 极速体验 + +启动TDengine的服务,在Linux终端执行taosdemo + +``` +> taosdemo +``` + +该命令将在数据库test下面自动创建一张超级表meters,该超级表下有1万张表,表名为"t0" 到"t9999",每张表有10万条记录,每条记录有 (f1, f2, f3)三个字段,时间戳从"2017-07-14 10:40:00 000" 到"2017-07-14 10:41:39 999",每张表带有标签areaid和loc, areaid被设置为1到10, loc被设置为"beijing"或者“shanghai"。 + +执行这条命令大概需要10分钟,最后共插入10亿条记录。 + +在TDengine客户端输入查询命令,体验查询速度。 + +- 查询超级表下记录总条数: + +``` +taos>select count(*) from test.meters; +``` + +- 查询10亿条记录的平均值、最大值、最小值等: + +``` +taos>select avg(f1), max(f2), min(f3) from test.meters; +``` + +- 查询loc="beijing"的记录总条数: + +``` +taos>select count(*) from test.meters where loc="beijing"; +``` + +- 查询areaid=10的所有记录的平均值、最大值、最小值等: + +``` +taos>select avg(f1), max(f2), min(f3) from test.meters where areaid=10; +``` + +- 对表t10按10s进行平均值、最大值和最小值聚合统计: + +``` +taos>select avg(f1), max(f2), min(f3) from test.t10 interval(10s); +``` + +**Note:** taosdemo命令本身带有很多选项,配置表的数目、记录条数等等,请执行 `taosdemo --help`详细列出。您可以设置不同参数进行体验。 + diff --git a/documentation20/Getting Started.md b/documentation20/Getting Started.md new file mode 100644 index 0000000000000000000000000000000000000000..00d97d3d9cd3d2ce8317938eb9e46ae74b48dab1 --- /dev/null +++ b/documentation20/Getting Started.md @@ -0,0 +1,151 @@ +#Getting Started + +## Quick Start + +At the moment, TDengine only runs on Linux. You can set up and install it either from the source code or the packages. It takes only a few seconds from download to run it successfully. + +### Install from Source + +Please visit our [github page](https://github.com/taosdata/TDengine) for instructions on installation from the source code. + +### Install from Package + +Three different packages are provided, please pick up the one you like. + +For the time being, TDengine only supports installation on Linux systems using [`systemd`](https://en.wikipedia.org/wiki/Systemd) as the service manager. To check if your system has *systemd*, use the _which_ command. + +```cmd +which systemd +``` + +If the `systemd` command is not found, please [install from source code](#Install-from-Source). + +### Running TDengine + +After installation, start the TDengine service by the `systemctl` command. + +```cmd +systemctl start taosd +``` + +Then check if the server is working now. +```cmd +systemctl status taosd +``` + +If the service is running successfully, you can play around through TDengine shell `taos`, the command line interface tool located in directory /usr/local/bin/taos + +**Note: The _systemctl_ command needs the root privilege. Use _sudo_ if you are not the _root_ user.** + +##TDengine Shell +To launch TDengine shell, the command line interface, in a Linux terminal, type: + +```cmd +taos +``` + +The welcome message is printed if the shell connects to TDengine server successfully, otherwise, an error message will be printed (refer to our [FAQ](../faq) page for troubleshooting the connection error). The TDengine shell prompt is: + +```cmd +taos> +``` + +In the TDengine shell, you can create databases, create tables and insert/query data with SQL. Each query command ends with a semicolon. It works like MySQL, for example: + +```mysql +create database db; +use db; +create table t (ts timestamp, cdata int); +insert into t values ('2019-07-15 10:00:00', 10); +insert into t values ('2019-07-15 10:01:05', 20); +select * from t; + ts | speed | +=================================== + 19-07-15 10:00:00.000| 10| + 19-07-15 10:01:05.000| 20| +Query OK, 2 row(s) in set (0.001700s) +``` + +Besides the SQL commands, the system administrator can check system status, add or delete accounts, and manage the servers. + +###Shell Command Line Parameters + +You can run `taos` command with command line options to fit your needs. Some frequently used options are listed below: + +- -c, --config-dir: set the configuration directory. It is _/etc/taos_ by default +- -h, --host: set the IP address of the server it will connect to, Default is localhost +- -s, --commands: set the command to run without entering the shell +- -u, -- user: user name to connect to server. Default is root +- -p, --password: password. Default is 'taosdata' +- -?, --help: get a full list of supported options + +Examples: + +```cmd +taos -h 192.168.0.1 -s "use db; show tables;" +``` + +###Run Batch Commands + +Inside TDengine shell, you can run batch commands in a file with *source* command. + +``` +taos> source ; +``` + +### Tips + +- Use up/down arrow key to check the command history +- To change the default password, use "`alter user`" command +- ctrl+c to interrupt any queries +- To clean the cached schema of tables or STables, execute command `RESET QUERY CACHE` + +## Major Features + +The core functionality of TDengine is the time-series database. To reduce the development and management complexity, and to improve the system efficiency further, TDengine also provides caching, pub/sub messaging system, and stream computing functionalities. It provides a full stack for IoT big data platform. The detailed features are listed below: + +- SQL like query language used to insert or explore data + +- C/C++, Java(JDBC), Python, Go, RESTful, and Node.JS interfaces for development + +- Ad hoc queries/analysis via Python/R/Matlab or TDengine shell + +- Continuous queries to support sliding-window based stream computing + +- Super table to aggregate multiple time-streams efficiently with flexibility + +- Aggregation over a time window on one or multiple time-streams + +- Built-in messaging system to support publisher/subscriber model + +- Built-in cache for each time stream to make latest data available as fast as light speed + +- Transparent handling of historical data and real-time data + +- Integrating with Telegraf, Grafana and other tools seamlessly + +- A set of tools or configuration to manage TDengine + + +For enterprise edition, TDengine provides more advanced features below: + +- Linear scalability to deliver higher capacity/throughput + +- High availability to guarantee the carrier-grade service + +- Built-in replication between nodes which may span multiple geographical sites + +- Multi-tier storage to make historical data management simpler and cost-effective + +- Web-based management tools and other tools to make maintenance simpler + +TDengine is specially designed and optimized for time-series data processing in IoT, connected cars, Industrial IoT, IT infrastructure and application monitoring, and other scenarios. Compared with other solutions, it is 10x faster on insert/query speed. With a single-core machine, over 20K requestes can be processed, millions data points can be ingested, and over 10 million data points can be retrieved in a second. Via column-based storage and tuned compression algorithm for different data types, less than 1/10 storage space is required. + +## Explore More on TDengine + +Please read through the whole documentation to learn more about TDengine. + diff --git a/documentation20/Model-ch.md b/documentation20/Model-ch.md new file mode 100644 index 0000000000000000000000000000000000000000..dfb079d13d8ee7babd88b3a3e6905a4db0a6b447 --- /dev/null +++ b/documentation20/Model-ch.md @@ -0,0 +1,43 @@ + + +# 数据建模 + +TDengine采用关系型数据模型,需要建库、建表。因此对于一个具体的应用场景,需要考虑库的设计,超级表和普通表的设计。本节不讨论细致的语法规则,只介绍概念。 + +##创建库 + +不同类型的数据采集点往往具有不同的数据特征,包括数据采集频率的高低,数据保留时间的长短,副本的数目,数据块的大小等等。为让各种场景下TDengine都能最大效率的工作,TDengine建议将不同数据特征的表创建在不同的库里,因为每个库可以配置不同的存储策略。创建一个库时,除SQL标准的选项外,应用还可以指定保留时长、副本数、内存块个数、时间精度、文件块里最大最小记录条数、是否压缩、一个数据文件覆盖的天数等多种参数。比如: + +```cmd +CREATE DATABASE power KEEP 365 DAYS 10 REPLICA 3 BLOCKS 4; +``` +上述语句将创建一个名为power的库,这个库的数据将保留365天(超过365天将被自动删除),每10天一个数据文件,副本数为3, 内存块数为4。详细的语法及参数请见TAOS SQL。 + +注意:任何一张表或超级表是属于一个库的,在创建表之前,必须先创建库。 + +## 创建超级表 +一个物联网系统,往往存在多种类型的设备,比如对于电网,存在智能电表、变压器、母线、开关等等。为便于多表之间的聚合,使用TDengine, 需要对每个类型的设备创建一超级表。以表一中的智能电表为例,可以使用如下的SQL命令创建超级表: +```cmd +CREATE TABLE meters (ts timestamp, current float, voltage int, phase float) TAGS (location binary(64), groupdId int); +``` +与创建普通表一样,创建表时,需要提供表名(示例中为meters),表结构Schema,即数据列的定义,为采集的物理量(示例中为ts, current, voltage, phase),数据类型可以为整型、浮点型、字符串等。除此之外,还需要提供标签的schema (示例中为location, groupId),标签的数据类型可以为整型、浮点型、字符串等。采集点的静态属性往往可以作为标签,比如采集点的地理位置、设备型号、设备组ID、管理员ID等等。标签的schema可以事后增加、删除、修改。具体定义以及细节请见 TAOS SQL一节。 + +每一种类型的数据采集点需要建立一个超级表,因此一个物联网系统,往往会有多个超级表。一个系统可以有多个DB,一个DB里可以有一到多个超级表。 + +## 创建表 +TDengine对每个数据采集点需要独立建表。与标准的关系型数据一样,一张表有表名,Schema,但除此之外,还可以带有一到多个标签。创建时,需要使用超级表做模板,同时指定标签的具体值。以表一中的智能电表为例,可以使用如下的SQL命令建表: +```cmd +CREATE TABLE d1001 USING meters TAGS ("Beijing.Chaoyang", 2); +``` +其中d1001是表名,meters是超级表的表名,后面紧跟标签Location的具体标签值”Beijing.Chaoyang",标签groupId的具体标签值2。虽然在创建表时,需要指定标签值,但可以事后修改。详细细则请见 TAOS SQL。 + +TDengine建议将数据采集点的全局唯一ID作为表名。但对于有的场景,并没有唯一的ID,可以将多个ID组合成一个唯一的ID。不建议将具有唯一性的ID作为标签值。 + +**自动建表**:在某些特殊场景中,用户在写数据时并不确定某个数据采集点的表是否存在,此时可在写入数据时使用自动建表语法来创建不存在的表,若该表已存在则不会建立新表。比如: + +```cmd +INSERT INTO d1001 USING METERS TAGS ("Beijng.Chaoyang", 2) VALUES (now, 10.2, 219, 0.32); +``` +上述SQL语句将记录(now, 10.2, 219, 0.32) 插入进表d1001。如果表d1001还未创建,则使用超级表meters做模板自动创建,同时打上标签值“Beijing.Chaoyang", 2。 + +**多列模型**:TDengine支持多列模型,只要这些物理量是同时采集的,这些量就可以作为不同列放在同一张表里。有的数据采集点有多组采集量,每一组的数据采集时间是不一样的,这时需要对同一个采集点建多张表。但还有一种极限的设计,单列模型,无论是否同时采集,每个采集的物理量单独建表。TDengine建议,只要采集时间一致,就采用多列模型,因为插入效率以及存储效率更高。 \ No newline at end of file diff --git a/documentation20/More on System Architecture-ch.md b/documentation20/More on System Architecture-ch.md new file mode 100644 index 0000000000000000000000000000000000000000..8e5eeee1c5a6c96ddda1281110766a12a56b8d12 --- /dev/null +++ b/documentation20/More on System Architecture-ch.md @@ -0,0 +1,248 @@ +# TDengine的技术设计 + +## 存储设计 + +TDengine的数据存储主要包含**元数据的存储**和**写入数据的存储**。以下章节详细介绍了TDengine各种数据的存储结构。 + +### 元数据的存储 + +TDengine中的元数据信息包括TDengine中的数据库,表,超级表等信息。元数据信息默认存放在 _/var/lib/taos/mgmt/_ 文件夹下。该文件夹的目录结构如下所示: +``` +/var/lib/taos/ + +--mgmt/ + +--db.db + +--meters.db + +--user.db + +--vgroups.db +``` +元数据在文件中按顺序排列。文件中的每条记录代表TDengine中的一个元数据机构(数据库、表等)。元数据文件只进行追加操作,即便是元数据的删除,也只是在数据文件中追加一条删除的记录。 + +### 写入数据的存储 + +TDengine中写入的数据在硬盘上是按时间维度进行分片的。同一个vnode中的表在同一时间范围内的数据都存放在同一文件组中,如下图中的v0f1804*文件。这一数据分片方式可以大大简化数据在时间维度的查询,提高查询速度。在默认配置下,硬盘上的每个文件存放10天数据。用户可根据需要调整数据库的 _daysPerFile_ 配置项进行配置。 数据在文件中是按块存储的。每个数据块只包含一张表的数据,且数据是按照时间主键递增排列的。数据在数据块中按列存储,这样使得同类型的数据存放在一起,可以大大提高压缩的比例,节省存储空间。TDengine对不同类型的数据采用了不同的压缩算法进行压缩,以达到最优的压缩结果。TDengine使用的压缩算法包括simple8B、delta-of-delta、RLE以及LZ4等。 + +TDengine的数据文件默认存放在 */var/lib/taos/data/* 下。而 */var/lib/taos/tsdb/* 文件夹下存放了vnode的信息、vnode中表的信息以及数据文件的链接等。其完整目录结构如下所示: +``` +/var/lib/taos/ + +--tsdb/ + | +--vnode0 + | +--meterObj.v0 + | +--db/ + | +--v0f1804.head->/var/lib/taos/data/vnode0/v0f1804.head1 + | +--v0f1804.data->/var/lib/taos/data/vnode0/v0f1804.data + | +--v0f1804.last->/var/lib/taos/data/vnode0/v0f1804.last1 + | +--v0f1805.head->/var/lib/taos/data/vnode0/v0f1805.head1 + | +--v0f1805.data->/var/lib/taos/data/vnode0/v0f1805.data + | +--v0f1805.last->/var/lib/taos/data/vnode0/v0f1805.last1 + | : + +--data/ + +--vnode0/ + +--v0f1804.head1 + +--v0f1804.data + +--v0f1804.last1 + +--v0f1805.head1 + +--v0f1805.data + +--v0f1805.last1 + : +``` + +#### meterObj文件 +每个vnode中只存在一个 _meterObj_ 文件。该文件中存储了vnode的基本信息(创建时间,配置信息,vnode的统计信息等)以及该vnode中表的信息。其结构如下所示: +``` +<文件开始> +[文件头] +[表记录1偏移量和长度] +[表记录2偏移量和长度] +... +[表记录N偏移量和长度] +[表记录1] +[表记录2] +... +[表记录N] +[表记录] +<文件结尾> +``` +其中,文件头大小为512字节,主要存放vnode的基本信息。每条表记录代表属于该vnode中的一张表在硬盘上的表示。 + +#### head文件 +head文件中存放了其对应的data文件中数据块的索引信息。该文件组织形式如下: +``` +<文件开始> +[文件头] +[表1偏移量] +[表2偏移量] +... +[表N偏移量] +[表1数据索引] +[表2数据索引] +... +[表N数据索引] +<文件结尾> +``` +文件开头的偏移量列表表示对应表的数据索引块的开始位置在文件中的偏移量。每张表的数据索引信息在head文件中都是连续存放的。这也使得TDengine在读取单表数据时,可以将该表所有的数据块索引一次性读入内存,大大提高读取速度。表的数据索引块组织如下: +``` +[索引块信息] +[数据块1索引] +[数据块2索引] +... +[数据块N索引] +``` +其中,索引块信息中记录了数据块的个数等描述信息。每个数据块索引对应一个在data文件或last文件中的一个单独的数据块。索引信息中记录了数据块存放的文件、数据块起始位置的偏移量、数据块中数据时间主键的范围等。索引块中的数据块索引是按照时间范围顺序排放的,这也就是说,索引块M对应的数据块中的数据时间范围都大于索引块M-1的。这种预先排序的存储方式使得在TDengine在进行按照时间戳进行查询时可以使用折半查找算法,大大提高查询速度。 + +#### data文件 +data文件中存放了真实的数据块。该文件只进行追加操作。其文件组织形式如下: +``` +<文件开始> +[文件头] +[数据块1] +[数据块2] +... +[数据块N] +<文件结尾> +``` +每个数据块只属于vnode中的一张表,且数据块中的数据按照时间主键排列。数据块中的数据按列组织排放,使得同一类型的数据排放在一起,方便压缩和读取。每个数据块的组织形式如下所示: +``` +[列1信息] +[列2信息] +... +[列N信息] +[列1数据] +[列2数据] +... +[列N数据] +``` +列信息中包含该列的类型,列的压缩算法,列数据在文件中的偏移量以及长度等。除此之外,列信息中也包含该内存块中该列数据的预计算结果,从而在过滤查询时根据预计算结果判定是否读取数据块,大大提高读取速度。 + +#### last文件 +为了防止数据块的碎片化,提高查询速度和压缩率,TDengine引入了last文件。当要落盘的数据块中的数据条数低于某个阈值时,TDengine会先将该数据块写入到last文件中进行暂时存储。当有新的数据需要落盘时,last文件中的数据会被读取出来与新数据组成新的数据块写入到data文件中。last文件的组织形式与data文件类似。 + +### TDengine数据存储小结 +TDengine通过其创新的架构和存储结构设计,有效提高了计算机资源的使用率。一方面,TDengine的虚拟化使得TDengine的水平扩展及备份非常容易。另一方面,TDengine将表中数据按时间主键排序存储且其列式存储的组织形式都使TDengine在写入、查询以及压缩方面拥有非常大的优势。 + + +## 查询处理 + +### 概述 + +TDengine提供了多种多样针对表和超级表的查询处理功能,除了常规的聚合查询之外,还提供针对时序数据的窗口查询、统计聚合等功能。TDengine的查询处理需要客户端、管理节点、数据节点协同完成。 各组件包含的与查询处理相关的功能和模块如下: + +客户端(Client App)。客户端包含TAOS SQL的解析(SQL Parser)和查询请求执行器(Query Executor),第二阶段聚合器(Result Merger),连续查询管理器(Continuous Query Manager)等主要功能模块构成。SQL解析器负责对SQL语句进行解析校验,并转化为抽象语法树,查询执行器负责将抽象语法树转化查询执行逻辑,并根据SQL语句查询条件,将其转换为针对管理节点元数据查询和针对数据节点的数据查询两级查询处理。由于TAOS SQL当前不提供复杂的嵌套查询和pipeline查询处理机制,所以不再需要查询计划优化、逻辑查询计划到物理查询计划转换等过程。第二阶段聚合器负责将各数据节点查询返回的独立结果进行二阶段聚合生成最后的结果。连续查询管理器则负责针对用户建立的连续查询进行管理,负责定时拉起查询请求并按需将结果写回TDengine或返回给客户应用。此外,客户端还负责查询失败后重试、取消查询请求、以及维持连接心跳、向管理节点上报查询状态等工作。 + +管理节点(Management Node)。管理节点保存了整个集群系统的全部数据的元数据信息,向客户端节点提供查询所需的数据的元数据,并根据集群的负载情况切分查询请求。通过超级表包含了通过该超级表创建的所有表的信息,因此查询处理器(Query Executor)负责针对标签(TAG)的查询处理,并将满足标签查询请求的表信息返回给客户端。此外,管理节点还维护集群的查询状态(Query Status Manager)维护,查询状态管理中在内存中临时保存有当前正在执行的全部查询,当客户端使用 *show queries* 命令的时候,将当前系统正在运行的查询信息返回客户端。 + +数据节点(Data Node)。数据节点保存了数据库中全部数据内容,并通过查询执行器、查询处理调度器、查询任务队列(Query Task Queue)进行查询处理的调度执行,从客户端接收到的查询处理请求都统一放置到处理队列中,查询执行器从队列中获得查询请求,并负责执行。通过查询优化器(Query Optimizer)对于查询进行基本的优化处理,以及通过数据节点的查询执行器(Query Executor)扫描符合条件的数据单元并返回计算结果。等接收客户端发出的查询请求,执行查询处理,并将结果返回。同时数据节点还需要响应来自管理节点的管理信息和命令,例如 *kill query* 命令以后,需要即刻停止执行的查询任务。 + +
+
图 1. 系统查询处理架构图(只包含查询相关组件)
+ +### 普通查询处理 + +客户端、管理节点、数据节点协同完成TDengine的查询处理全流程。我们以一个具体的SQL查询为例,说明TDengine的查询处理流程。SQL语句向超级表*FOO_SUPER_TABLE*查询获取时间范围在2019年1月12日整天,标签TAG_LOC是'beijing'的表所包含的所有记录总数,SQL语句如下: + +```sql +SELECT COUNT(*) +FROM FOO_SUPER_TABLE +WHERE TAG_LOC = 'beijing' AND TS >= '2019-01-12 00:00:00' AND TS < '2019-01-13 00:00:00' +``` + +首先,客户端调用TAOS SQL解析器对SQL语句进行解析及合法性检查,然后生成语法树,并从中提取查询的对象 — 超级表 *FOO_SUPER_TABLE* ,然后解析器向管理节点(Management Node)请求其相应的元数据信息,并将过滤信息(TAG_LOC='beijing')同时发送到管理节点。 + +管理节点接收元数据获取的请求,首先找到超级表 *FOO_SUPER_TABLE* 基础信息,然后应用查询条件来过滤通过该超级表创建的全部表,最后满足查询条件(TAG_LOC='beijing'),即 *TAG_LOC* 标签列是 'beijing' 的的通过其查询执行器将满足查询要求的对象(表或超级表)的元数据信息返回给客户端。 + +客户端获得了 *FOO_SUPER_TABLE* 的元数据信息后,查询执行器根据元数据中的数据分布,分别向保存有相应数据的节点发起查询请求,此时时间戳范围过滤条件(TS >= '2019-01-12 00:00:00' AND TS < '2019-01-13 00:00:00')需要同时发送给全部的数据节点。 + +数据节点接收到发自客户端的查询,转化为内部结构并进行优化以后将其放入任务执行队列,等待查询执行器执行。当查询结果获得以后,将查询结果返回客户端。数据节点执行查询的过程均相互独立,完全只依赖于自身的数据和内容进行计算。 + +当所有查询涉及的数据节点返回结果后,客户端将每个数据节点查询的结果集再次进行聚合(针对本案例,即将所有结果再次进行累加),累加的结果即为最后的查询结果。第二阶段聚合并不是所有的查询都需要。例如,针对数据的列选取操作,实际上是不需要第二阶段聚合。 + +### REST查询处理 + +在 C/C++ 、Python接口、 JDBC 接口之外,TDengine 还提供基于 HTTP 协议的 REST 接口。不同于使用应用客户端开发程序进行的开发。当用户使用 REST 接口的时候,所有的查询处理过程都是在服务器端来完成,用户的应用服务不会参与数据库的计算过程,查询处理完成后结果通过 HTTP的 JSON 格式返回给用户。 + +
+
图 2. REST查询架构
+ +当用户使用基于HTTP的REST查询接口,HTTP的请求首先与位于数据节点的HTTP连接器( Connector),建立连接,然后通过REST的签名机制,使用Token来确保请求的可靠性。对于数据节点,HTTP连接器接收到请求后,调用内嵌的客户端程序发起查询请求,内嵌客户端将解析通过HTTP连接器传递过来的SQL语句,解析该SQL语句并按需向管理节点请求元数据信息,然后向本机或集群中其他节点发送查询请求,最后按需聚合计算结果。HTTP连接器接收到请求SQL以后,后续的流程处理与采用应用客户端方式的查询处理完全一致。最后,还需要将查询的结果转换为JSON格式字符串,并通过HTTP 响应返回给客户端。 + +可以看到,在处理HTTP流程的整个过程中,用户应用不再参与到查询处理的过程中,只负责通过HTTP协议发送SQL请求并接收JSON格式的结果。同时还需要注意的是,每个数据节点均内嵌了一个HTTP连接器和客户端程序,因此请求集群中任何一个数据节点,该数据节点均能够通过HTTP协议返回用户的查询结果。 + +### 技术特征 + +由于TDengine采用数据和标签分离存储的模式,能够极大地降低标签数据存储的冗余度。标签数据直接关联到每个表,并采用全内存的结构进行管理和维护标签数据,全内存的结构提供快速的查询处理,千万级别规模的标签数据查询可以在毫秒级别返回。首先针对标签数据的过滤可以有效地降低第二阶段的查询涉及的数据规模。为有效地提升查询处理的性能,针对物联网数据的不可更改的特点,TDengine采用在每个保存的数据块上,都记录下该数据块中数据的最大值、最小值、和等统计数据。如果查询处理涉及整个数据块的全部数据,则直接使用预计算结果,不再读取数据块的内容。由于预计算模块的大小远小于磁盘上存储的具体数据的大小,对于磁盘IO为瓶颈的查询处理,使用预计算结果可以极大地减小读取IO,并加速查询处理的流程。 + +由于TDengine采用按列存储数据。当从磁盘中读取数据块进行计算的时候,按照查询列信息读取该列数据,并不需要读取其他不相关的数据,可以最小化读取数据。此外,由于采用列存储结构,数据节点针对数据的扫描采用该列数据块进行,可以充分利用CPU L2高速缓存,极大地加速数据扫描的速度。此外,对于某些查询,并不会等全部查询结果生成后再返回结果。例如,列选取查询,当第一批查询结果获得以后,数据节点直接将其返回客户端。同时,在查询处理过程中,系统在数据节点接收到查询请求以后马上返回客户端查询确认信息,并同时拉起查询处理过程,并等待查询执行完成后才返回给用户查询有响应。 + +## TDengine集群设计 + +### 1:集群与主要逻辑单元 + +TDengine是基于硬件、软件系统不可靠、一定会有故障的假设进行设计的,是基于任何单台计算机都无足够能力处理海量数据的假设进行设计的。因此TDengine从研发的第一天起,就按照分布式高可靠架构进行设计,是完全去中心化的,是水平扩展的,这样任何单台或多台服务器宕机或软件错误都不影响系统的服务。通过节点虚拟化并辅以自动化负载均衡技术,TDengine能最大限度地利用异构集群中的计算和存储资源。而且只要数据副本数大于一,无论是硬软件的升级、还是IDC的迁移等都无需停止集群的服务,极大地保证系统的正常运行,并且降低了系统管理员和运维人员的工作量。 + +下面的示例图上有八个物理节点,每个物理节点被逻辑的划分为多个虚拟节点。下面对系统的基本概念进行介绍。 + + + +![assets/nodes.png](../assets/nodes.png) + +**物理节点(dnode)**:集群中的一物理服务器或云平台上的一虚拟机。为安全以及通讯效率,一个物理节点可配置两张网卡,或两个IP地址。其中一张网卡用于集群内部通讯,其IP地址为**privateIp**, 另外一张网卡用于与集群外部应用的通讯,其IP地址为**publicIp**。在一些云平台(如阿里云),对外的IP地址是映射过来的,因此publicIp还有一个对应的内部IP地址**internalIp**(与privateIp不同)。对于只有一个IP地址的物理节点,publicIp, privateIp以及internalIp都是同一个地址,没有任何区别。一个dnode上有而且只有一个taosd实例运行。 + +**虚拟数据节点(vnode)**:在物理节点之上的可独立运行的基础逻辑单元,时序数据写入、存储、查询等操作逻辑都在虚拟节点中进行(图中V),采集的时序数据就存储在vnode上。一个vnode包含固定数量的表。当创建一张新表时,系统会检查是否需要创建新的vnode。一个物理节点上能创建的vnode的数量取决于物理节点的硬件资源。一个vnode只属于一个DB,但一个DB可以有多个vnode。 + +**虚拟数据节点组(vgroup)**: 位于不同物理节点的vnode可以组成一个虚拟数据节点组vnode group(如上图dnode0中的V0, dnode1中的V1, dnode6中的V2属于同一个虚拟节点组)。归属于同一个vgroup的虚拟节点采取master/slave的方式进行管理。写只能在master上进行,但采用asynchronous的方式将数据同步到slave,这样确保了一份数据在多个物理节点上有拷贝。如果master节点宕机,其他节点监测到后,将重新选举vgroup里的master, 新的master能继续处理数据请求,从而保证系统运行的可靠性。一个vgroup里虚拟节点个数就是数据的副本数。如果一个DB的副本数为N,系统必须有至少N个物理节点。副本数在创建DB时通过参数replica可以指定,缺省为1。使用TDengine, 数据的安全依靠多副本解决,因此不再需要昂贵的磁盘阵列等存储设备。 + +**虚拟管理节点(mnode)**:负责所有节点运行状态的监控和维护,以及节点之间的负载均衡(图中M)。同时,虚拟管理节点也负责元数据(包括用户、数据库、表、静态标签等)的存储和管理,因此也称为Meta Node。TDengine集群中可配置多个(最多不超过5个) mnode,它们自动构建成为一个管理节点集群(图中M0, M1, M2)。mnode间采用master/slave的机制进行管理,而且采取强一致方式进行数据同步。mnode集群的创建由系统自动完成,无需人工干预。每个dnode上至多有一个mnode,而且每个dnode都知道整个集群中所有mnode的IP地址。 + +**taosc**:一个软件模块,是TDengine给应用提供的驱动程序(driver),内嵌于JDBC、ODBC driver中,或者C语言连接库里。应用都是通过taosc而不是直接来与整个集群进行交互的。这个模块负责获取并缓存元数据;将插入、查询等请求转发到正确的虚拟节点;在把结果返回给应用时,还需要负责最后一级的聚合、排序、过滤等操作。对于JDBC, ODBC, C/C++接口而言,这个模块是在应用所处的计算机上运行,但消耗的资源很小。为支持全分布式的REST接口,taosc在TDengine集群的每个dnode上都有一运行实例。 + +**对外服务地址**:TDengine集群可以容纳单台、多台甚至几千台物理节点。应用只需要向集群中任何一个物理节点的publicIp发起连接即可。启动CLI应用taos时,选项-h需要提供的就是publicIp。 + +**master/secondIp**:每一个dnode都需要配置一个masterIp。dnode启动后,将对配置的masterIp发起加入集群的连接请求。masterIp是已经创建的集群中的任何一个节点的privateIp,对于集群中的第一个节点,就是它自己的privateIp。为保证连接成功,每个dnode还可配置secondIp, 该IP地址也是已创建的集群中的任何一个节点的privateIp。如果一个节点连接masterIp失败,它将试图链接secondIp。 + +dnode启动后,会获知集群的mnode IP列表,并且定时向mnode发送状态信息。 + +vnode与mnode只是逻辑上的划分,都是执行程序taosd里的不同线程而已,无需安装不同的软件,做任何特殊的配置。最小的系统配置就是一个物理节点,vnode,mnode和taosc都存在而且都正常运行,但单一节点无法保证系统的高可靠。 + +### 2:一典型的操作流程 + +为解释vnode, mnode, taosc和应用之间的关系以及各自扮演的角色,下面对写入数据这个典型操作的流程进行剖析。 + + + +![Picture1](../assets/Picture2.png) + + + +1. 应用通过JDBC、ODBC或其他API接口发起插入数据的请求。 +2. taosc会检查缓存,看是有保存有该表的meta data。如果有,直接到第4步。如果没有,taosc将向mnode发出get meta-data请求。 +3. mnode将该表的meta-data返回给taosc。Meta-data包含有该表的schema, 而且还有该表所属的vgroup信息(vnode ID以及所在的dnode的IP地址,如果副本数为N,就有N组vnodeID/IP)。如果taosc迟迟得不到mnode回应,而且存在多个mnode,taosc将向下一个mnode发出请求。 +4. taosc向master vnode发起插入请求。 +5. vnode插入数据后,给taosc一个应答,表示插入成功。如果taosc迟迟得不到vnode的回应,taosc会认为该节点已经离线。这种情况下,如果被插入的数据库有多个副本,taosc将向vgroup里下一个vnode发出插入请求。 +6. taosc通知APP,写入成功。 + +对于第二和第三步,taosc启动时,并不知道mnode的IP地址,因此会直接向配置的集群对外服务的IP地址发起请求。如果接收到该请求的dnode并没有配置mnode,该dnode会在回复的消息中告知mnode的IP地址列表(如果有多个dnodes,mnode的IP地址可以有多个),这样taosc会重新向新的mnode的IP地址发出获取meta-data的请求。 + +对于第四和第五步,没有缓存的情况下,taosc无法知道虚拟节点组里谁是master,就假设第一个vnodeID/IP就是master,向它发出请求。如果接收到请求的vnode并不是master,它会在回复中告知谁是master,这样taosc就向建议的master vnode发出请求。一旦得到插入成功的回复,taosc会缓存住master节点的信息。 + +上述是插入数据的流程,查询、计算的流程也完全一致。taosc把这些复杂的流程全部封装屏蔽了,因此应用无需处理重定向、获取meta data等细节,完全是透明的。 + +通过taosc缓存机制,只有在第一次对一张表操作时,才需要访问mnode, 因此mnode不会成为系统瓶颈。但因为schema有可能变化,而且vgroup有可能发生改变(比如负载均衡发生),因此taosc需要定时自动刷新缓存。 + +### 3:数据分区 + +vnode(虚拟数据节点)保存采集的时序数据,而且查询、计算都在这些节点上进行。为便于负载均衡、数据恢复、支持异构环境,TDengine将一个物理节点根据其计算和存储资源切分为多个vnode。这些vnode的管理是TDengine自动完成的,对应用完全透明。 + +对于单独一个数据采集点,无论其数据量多大,一个vnode(或vnode group, 如果副本数大于1)有足够的计算资源和存储资源来处理(如果每秒生成一条16字节的记录,一年产生的原始数据不到0.5G),因此TDengine将一张表的所有数据都存放在一个vnode里,而不会让同一个采集点的数据分布到两个或多个dnode上。而且一个vnode可存储多张表的数据,一个vnode可容纳的表的数目由配置参数tables指定,缺省为2000。设计上,一个vnode里所有的表都属于同一个DB。因此一个数据库DB需要的vnode或vgroup的个数等于:数据库表的数目/tables。 + +创建DB时,系统并不会马上分配资源。但当创建一张表时,系统将看是否有已经分配的vnode, 而且是否有空位,如果有,立即在该有空位的vnode创建表。如果没有,系统将从集群中,根据当前的负载情况,在一个dnode上创建一新的vnode, 然后创建表。如果DB有多个副本,系统不是只创建一个vnode,而是一个vgroup(虚拟数据节点组)。系统对vnode的数目没有任何限制,仅仅受限于物理节点本身的计算和存储资源。 + +参数tables的设置需要考虑具体场景,创建DB时,可以个性化指定该参数。该参数不宜过大,也不宜过小。过小,极端情况,就是每个数据采集点一个vnode, 这样导致系统数据文件过多。过大,虚拟化带来的优势就会丧失。给定集群计算资源的情况下,整个系统vnode的个数应该是CPU核的数目的两倍以上。 + +### 4:负载均衡 + +每个dnode(物理节点)都定时向 mnode(虚拟管理节点)报告其状态(包括硬盘空间、内存大小、CPU、网络、虚拟节点个数等),因此mnode了解整个集群的状态。基于整体状态,当mnode发现某个dnode负载过重,它会将dnode上的一个或多个vnode挪到其他dnode。在挪动过程中,对外服务继续进行,数据插入、查询和计算操作都不受影响。负载均衡操作结束后,应用也无需重启,将自动连接新的vnode。 + +如果mnode一段时间没有收到dnode的状态报告,mnode会认为这个dnode已经离线。如果离线时间超过一定时长(时长由配置参数offlineThreshold决定),该dnode将被mnode强制剔除出集群。该dnode上的vnodes如果副本数大于一,系统将自动在其他dnode上创建新的副本,以保证数据的副本数。 + + + +**Note:**目前集群功能仅仅限于企业版 \ No newline at end of file diff --git a/documentation20/More on System Architecture.md b/documentation20/More on System Architecture.md new file mode 100644 index 0000000000000000000000000000000000000000..d7a38b99a3ae5a630509f3ef0f0ffdc97d3aaaf1 --- /dev/null +++ b/documentation20/More on System Architecture.md @@ -0,0 +1,176 @@ +# TDengine System Architecture + +## Storage Design + +TDengine data mainly include **metadata** and **data** that we will introduce in the following sections. + +### Metadata Storage + +Metadata include the information of databases, tables, etc. Metadata files are saved in _/var/lib/taos/mgmt/_ directory by default. The directory tree is as below: +``` +/var/lib/taos/ + +--mgmt/ + +--db.db + +--meters.db + +--user.db + +--vgroups.db +``` + +A metadata structure (database, table, etc.) is saved as a record in a metadata file. All metadata files are appended only, and even a drop operation adds a deletion record at the end of the file. + +### Data storage + +Data in TDengine are sharded according to the time range. Data of tables in the same vnode in a certain time range are saved in the same filegroup, such as files v0f1804*. This sharding strategy can effectively improve data searching speed. By default, a group of files contains data in 10 days, which can be configured by *daysPerFile* in the configuration file or by *DAYS* keyword in *CREATE DATABASE* clause. Data in files are blockwised. A data block only contains one table's data. Records in the same data block are sorted according to the primary timestamp, which helps to improve the compression rate and save storage. The compression algorithms used in TDengine include simple8B, delta-of-delta, RLE, LZ4, etc. + +By default, TDengine data are saved in */var/lib/taos/data/* directory. _/var/lib/taos/tsdb/_ directory contains vnode informations and data file linkes. + +``` +/var/lib/taos/ + +--tsdb/ + | +--vnode0 + | +--meterObj.v0 + | +--db/ + | +--v0f1804.head->/var/lib/taos/data/vnode0/v0f1804.head1 + | +--v0f1804.data->/var/lib/taos/data/vnode0/v0f1804.data + | +--v0f1804.last->/var/lib/taos/data/vnode0/v0f1804.last1 + | +--v0f1805.head->/var/lib/taos/data/vnode0/v0f1805.head1 + | +--v0f1805.data->/var/lib/taos/data/vnode0/v0f1805.data + | +--v0f1805.last->/var/lib/taos/data/vnode0/v0f1805.last1 + | : + +--data/ + +--vnode0/ + +--v0f1804.head1 + +--v0f1804.data + +--v0f1804.last1 + +--v0f1805.head1 + +--v0f1805.data + +--v0f1805.last1 + : +``` + +#### meterObj file +There are only one meterObj file in a vnode. Informations bout the vnode, such as created time, configuration information, vnode statistic informations are saved in this file. It has the structure like below: + +``` + +[file_header] +[table_record1_offset&length] +[table_record2_offset&length] +... +[table_recordN_offset&length] +[table_record1] +[table_record2] +... +[table_recordN] + +``` +The file header takes 512 bytes, which mainly contains informations about the vnode. Each table record is the representation of a table on disk. + +#### head file +The _head_ files contain the index of data blocks in the _data_ file. The inner organization is as below: +``` + +[file_header] +[table1_offset] +[table2_offset] +... +[tableN_offset] +[table1_index_block] +[table2_index_block] +... +[tableN_index_block] + +``` +The table offset array in the _head_ file saves the information about the offsets of each table index block. Indices on data blocks in the same table are saved continuously. This also makes it efficient to load data indices on the same table. The data index block has a structure like: + +``` +[index_block_info] +[block1_index] +[block2_index] +... +[blockN_index] +``` +The index block info part contains the information about the index block such as the number of index blocks, etc. Each block index corresponds to a real data block in the _data_ file or _last_ file. Information about the location of the real data block, the primary timestamp range of the data block, etc. are all saved in the block index part. The block indices are sorted in ascending order according to the primary timestamp. So we can apply algorithms such as the binary search on the data to efficiently search blocks according to time. + +#### data file +The _data_ files store the real data block. They are append-only. The organization is as: +``` + +[file_header] +[block1] +[block2] +... +[blockN] + +``` +A data block in _data_ files only belongs to a table in the vnode and the records in a data block are sorted in ascending order according to the primary timestamp key. Data blocks are column-oriented. Data in the same column are stored contiguously, which improves reading speed and compression rate because of their similarity. A data block has the following organization: + +``` +[column1_info] +[column2_info] +... +[columnN_info] +[column1_data] +[column2_data] +... +[columnN_data] +``` +The column info part includes information about column types, column compression algorithm, column data offset and length in the _data_ file, etc. Besides, pre-calculated results of the column data in the block are also in the column info part, which helps to improve reading speed by avoiding loading data block necessarily. + +#### last file +To avoid storage fragment and to import query speed and compression rate, TDengine introduces an extra file, the _last_ file. When the number of records in a data block is lower than a threshold, TDengine will flush the block to the _last_ file for temporary storage. When new data comes, the data in the _last_ file will be merged with the new data and form a larger data block and written to the _data_ file. The organization of the _last_ file is similar to the _data_ file. + +### Summary +The innovation in architecture and storage design of TDengine improves resource usage. On the one hand, the virtualization makes it easy to distribute resources between different vnodes and for future scaling. On the other hand, sorted and column-oriented storage makes TDengine have a great advantage in writing, querying and compression. + +## Query Design + +#### Introduction + +TDengine provides a variety of query functions for both tables and super tables. In addition to regular aggregate queries, it also provides time window based query and statistical aggregation for time series data. TDengine's query processing requires the client app, management node, and data node to work together. The functions and modules involved in query processing included in each component are as follows: + +Client (Client App). The client development kit, embed in a client application, consists of TAOS SQL parser and query executor, the second-stage aggregator (Result Merger), continuous query manager and other major functional modules. The SQL parser is responsible for parsing and verifying the SQL statement and converting it into an abstract syntax tree. The query executor is responsible for transforming the abstract syntax tree into the query execution logic and creates the metadata query according to the query condition of the SQL statement. Since TAOS SQL does not currently include complex nested queries and pipeline query processing mechanism, there is no longer need for query plan optimization and physical query plan conversions. The second-stage aggregator is responsible for performing the aggregation of the independent results returned by query involved data nodes at the client side to generate final results. The continuous query manager is dedicated to managing the continuous queries created by users, including issuing fixed-interval query requests and writing the results back to TDengine or returning to the client application as needed. Also, the client is also responsible for retrying after the query fails, canceling the query request, and maintaining the connection heartbeat and reporting the query status to the management node. + +Management Node. The management node keeps the metadata of all the data of the entire cluster system, provides the metadata of the data required for the query from the client node, and divides the query request according to the load condition of the cluster. The super table contains information about all the tables created according to the super table, so the query processor (Query Executor) of the management node is responsible for the query processing of the tags of tables and returns the table information satisfying the tag query. Besides, the management node maintains the query status of the cluster in the Query Status Manager component, in which the metadata of all queries that are currently executing are temporarily stored in-memory buffer. When the client issues *show queries* command to management node, current running queries information is returned to the client. + +Data Node. The data node, responsible for storing all data of the database, consists of query executor, query processing scheduler, query task queue, and other related components. Once the query requests from the client received, they are put into query task queue and waiting to be processed by query executor. The query executor extracts the query request from the query task queue and invokes the query optimizer to perform the basic optimization for the query execution plan. And then query executor scans the qualified data blocks in both cache and disk to obtain qualified data and return the calculated results. Besides, the data node also needs to respond to management information and commands from the management node. For example, after the *kill query* received from the management node, the query task needs to be stopped immediately. + +
+
Fig 1. System query processing architecture diagram (only query related components)
+ +#### Query Process Design + +The client, the management node, and the data node cooperate to complete the entire query processing of TDengine. Let's take a concrete SQL query as an example to illustrate the whole query processing flow. The SQL statement is to query on super table *FOO_SUPER_TABLE* to get the total number of records generated on January 12, 2019, from the table, of which TAG_LOC equals to 'beijing'. The SQL statement is as follows: + +```sql +SELECT COUNT(*) +FROM FOO_SUPER_TABLE +WHERE TAG_LOC = 'beijing' AND TS >= '2019-01-12 00:00:00' AND TS < '2019-01-13 00:00:00' +``` + +First, the client invokes the TAOS SQL parser to parse and validate the SQL statement, then generates a syntax tree, and extracts the object of the query - the super table *FOO_SUPER_TABLE*, and then the parser sends requests with filtering information (TAG_LOC='beijing') to management node to get the corresponding metadata about *FOO_SUPER_TABLE*. + +Once the management node receives the request for metadata acquisition, first finds the super table *FOO_SUPER_TABLE* basic information, and then applies the query condition (TAG_LOC='beijing') to filter all the related tables created according to it. And finally, the query executor returns the metadata information that satisfies the query request to the client. + +After the client obtains the metadata information of *FOO_SUPER_TABLE*, the query executor initiates a query request with timestamp range filtering condition (TS >= '2019- 01-12 00:00:00' AND TS < '2019-01-13 00:00:00') to all nodes that hold the corresponding data according to the information about data distribution in metadata. + +The data node receives the query sent from the client, converts it into an internal structure and puts it into the query task queue to be executed by query executor after optimizing the execution plan. When the query result is obtained, the query result is returned to the client. It should be noted that the data nodes perform the query process independently of each other, and rely solely on their data and content for processing. + +When all data nodes involved in the query return results, the client aggregates the result sets from each data node. In this case, all results are accumulated to generate the final query result. The second stage of aggregation is not always required for all queries. For example, a column selection query does not require a second-stage aggregation at all. + +#### REST Query Process + +In addition to C/C++, Python, and JDBC interface, TDengine also provides a REST interface based on the HTTP protocol, which is different from using the client application programming interface. When the user uses the REST interface, all the query processing is completed on the server-side, and the user's application is not involved in query processing anymore. After the query processing is completed, the result is returned to the client through the HTTP JSON string. + +
+
Fig. 2 REST query architecture
+ +When a client uses an HTTP-based REST query interface, the client first establishes a connection with the HTTP connector at the data node and then uses the token to ensure the reliability of the request through the REST signature mechanism. For the data node, after receiving the request, the HTTP connector invokes the embedded client program to initiate a query processing, and then the embedded client parses the SQL statement from the HTTP connector and requests the management node to get metadata as needed. After that, the embedded client sends query requests to the same data node or other nodes in the cluster and aggregates the calculation results on demand. Finally, you also need to convert the result of the query into a JSON format string and return it to the client via an HTTP response. After the HTTP connector receives the request SQL, the subsequent process processing is completely consistent with the query processing using the client application development kit. + +It should be noted that during the entire processing, the client application is no longer involved in, and is only responsible for sending SQL requests through the HTTP protocol and receiving the results in JSON format. Besides, each data node is embedded with an HTTP connector and a client, so any data node in the cluster received requests from a client, the data node can initiate the query and return the result to the client through the HTTP protocol, with transfer the request to other data nodes. + +#### Technology + +Because TDengine stores data and tags value separately, the tag value is kept in the management node and directly associated with each table instead of records, resulting in a great reduction of the data storage. Therefore, the tag value can be managed by a fully in-memory structure. First, the filtering of the tag data can drastically reduce the data size involved in the second phase of the query. The query processing for the data is performed at the data node. TDengine takes advantage of the immutable characteristics of IoT data by calculating the maximum, minimum, and other statistics of the data in one data block on each saved data block, to effectively improve the performance of query processing. If the query process involves all the data of the entire data block, the pre-computed result is used directly, and the content of the data block is no longer needed. Since the size of disk space required to store the pre-computation result is much smaller than the size of the specific data, the pre-computation result can greatly reduce the disk IO and speed up the query processing. + +TDengine employs column-oriented data storage techniques. When the data block is involved to be loaded from the disk for calculation, only the required column is read according to the query condition, and the read overhead can be minimized. The data of one column is stored in a contiguous memory block and therefore can make full use of the CPU L2 cache to greatly speed up the data scanning. Besides, TDengine utilizes the eagerly responding mechanism and returns a partial result before the complete result is acquired. For example, when the first batch of results is obtained, the data node immediately returns it directly to the client in case of a column select query. \ No newline at end of file diff --git a/documentation20/Queries-ch.md b/documentation20/Queries-ch.md new file mode 100644 index 0000000000000000000000000000000000000000..0f4124f1fca71a58c21b4c5f753d074a75d6e0c5 --- /dev/null +++ b/documentation20/Queries-ch.md @@ -0,0 +1,86 @@ + + + + +# 高效查询数据 + +## 主要查询功能 + +TDengine采用SQL作为查询语言,应用程序可以通过C/C++, JDBC, GO, Python连接器发送SQL查询语句,用户还可以通过TAOS Shell直接手动执行SQL即席查询,十分方便。支持如下查询功能: + +- 查询单列、或多列查询 +- 支持值过滤条件:\>, \<, =, \<> 大于,小于,等于,不等于等等 +- 支持对标签的模糊匹配 +- 支持Group by, Order by, Limit, Offset +- 支持列之间的四则运算 +- 支持时间戳对齐的JOIN操作 +- 支持多种函数: count, max, min, avg, sum, twa, stddev, leastsquares, top, bottom, first, last, percentile, apercentile, last_row, spread, diff + +例如:在TAOS Shell中,从表d1001中查询出vlotage >215的记录,按时间降序排列,仅仅输出2条。 +```mysql +taos> select * from d1001 where voltage > 215 order by ts desc limit 2; + ts | current | voltage | phase | +====================================================================================== + 2018-10-03 14:38:16.800 | 12.30000 | 221 | 0.31000 | + 2018-10-03 14:38:15.000 | 12.60000 | 218 | 0.33000 | +Query OK, 2 row(s) in set (0.001100s) +``` +为满足物联网场景的需求,TDengine支持几个特殊的函数,比如twa(时间加权平均),spread (最大值与最小值的差),last_row(最后一条记录)等,更多与物联网场景相关的函数将添加进来。TDengine还支持连续查询。 + +具体的查询语法请看TAOS SQL。 + +## 多表聚合查询 + +TDengine对每个数据采集点单独建表,但应用经常需要对数据点之间进行聚合。为高效的进行聚合操作,TDengine引入超级表(STable)的概念。超级表用来代表一特定类型的数据采集点,它是表的集合,包含多张表。这集合里每张表的Schema是一样的,但每张表都带有自己的静态标签,标签可以多个,可以随时增加、删除和修改。 + +应用可通过指定标签的过滤条件,对一个STable下的全部或部分表进行聚合或统计操作,这样大大简化应用的开发。其具体流程如下图所示: + +
+ +
多表聚合查询原理图
+ +1:应用将一个查询条件发往系统;2: taosc将超级表的名字发往Meta Node(管理节点);3:管理节点将超级表所拥有的vnode列表发回taosc;4:taosc将计算的请求连同标签过滤条件发往这些vnode对应的多个数据节点;5:每个vnode先在内存里查找出自己节点里符合标签过滤条件的表的集合,然后扫描存储的时序数据,完成相应的聚合计算,将结果返回给taosc;6:taosc将多个数据节点返回的结果做最后的聚合,将其返回给应用。 + +由于TDengine在vnode内将标签数据与时序数据分离存储,通过先在内存里过滤标签数据,将需要扫描的数据集大幅减少,大幅提升了聚合计算速度。同时,由于数据分布在多个vnode/dnode,聚合计算操作在多个vnode里并发进行,又进一步提升了聚合的速度。 + +对普通表的聚合函数以及绝大部分操作都适用于超级表,语法完全一样,细节请看TAOS SQL。 + +比如:在TAOS Shell,查找所有智能电表采集的电压平均值,并按照location分组 + +```mysql +taos> select avg(voltage) from meters group by location; + avg(voltage) | location | +============================================================= + 222.000000000 | Beijing.Haidian | + 219.200000000 | Beijing.Chaoyang | +Query OK, 2 row(s) in set (0.002136s) +``` + +## 降采样查询、插值 + +物联网场景里,经常需要做down sampling,需要将采集的数据按时间段进行聚合。TDengine提供了一个简便的关键词interval让操作变得极为简单。比如:将智能电表d1001采集的电流值每10秒钟求和 +```mysql +taos> SELECT sum(current) FROM d1001 interval(10s) ; + ts | sum(current) | +====================================================== + 2018-10-03 14:38:00.000 | 10.300000191 | + 2018-10-03 14:38:10.000 | 24.900000572 | +Query OK, 2 row(s) in set (0.000883s) +``` +降采样操作还适用于超级表,比如:将所有智能电表采集的电流值每秒钟求和 +```mysql +taos> SELECT sum(current) FROM meters interval(1s) ; + ts | sum(current) | +====================================================== + 2018-10-03 14:38:04.000 | 10.199999809 | + 2018-10-03 14:38:05.000 | 32.900000572 | + 2018-10-03 14:38:06.000 | 11.500000000 | + 2018-10-03 14:38:15.000 | 12.600000381 | + 2018-10-03 14:38:16.000 | 36.000000000 | +Query OK, 5 row(s) in set (0.001538s) +``` + +物联网场景里,每个数据采集点采集数据的时间是难同步的,但很多分析算法(比如FFT)需要把采集的数据严格按照时间等间隔的对齐,在很多系统里,需要应用自己写程序来处理,但使用TDengine的降采样操作就轻松解决。如果一个时间间隔里,没有采集的数据,TDengine还提供插值计算的功能。 + +语法规则细节请见TAOS SQL。 + diff --git a/documentation20/Super Table-ch.md b/documentation20/Super Table-ch.md new file mode 100644 index 0000000000000000000000000000000000000000..14145cbb70aa421b6c1d3340ce8139d8aa4b642c --- /dev/null +++ b/documentation20/Super Table-ch.md @@ -0,0 +1,224 @@ +# 超级表STable:多表聚合 + +TDengine要求每个数据采集点单独建表,这样能极大提高数据的插入/查询性能,但是导致系统中表的数量猛增,让应用对表的维护以及聚合、统计操作难度加大。为降低应用的开发难度,TDengine引入了超级表STable (Super Table)的概念。 + +## 什么是超级表 + +STable是同一类型数据采集点的抽象,是同类型采集实例的集合,包含多张数据结构一样的子表。每个STable为其子表定义了表结构和一组标签:表结构即表中记录的数据列及其数据类型;标签名和数据类型由STable定义,标签值记录着每个子表的静态信息,用以对子表进行分组过滤。子表本质上就是普通的表,由一个时间戳主键和若干个数据列组成,每行记录着具体的数据,数据查询操作与普通表完全相同;但子表与普通表的区别在于每个子表从属于一张超级表,并带有一组由STable定义的标签值。每种类型的采集设备可以定义一个STable。数据模型定义表的每列数据的类型,如温度、压力、电压、电流、GPS实时位置等,而标签信息属于Meta Data,如采集设备的序列号、型号、位置等,是静态的,是表的元数据。用户在创建表(数据采集点)时指定STable(采集类型)外,还可以指定标签的值,也可事后增加或修改。 + +TDengine扩展标准SQL语法用于定义STable,使用关键词tags指定标签信息。语法如下: + +```mysql +CREATE TABLE ( TIMESTAMP, field_name1 field_type,…) TAGS(tag_name tag_type, …) +``` + +其中tag_name是标签名,tag_type是标签的数据类型。标签可以使用时间戳之外的其他TDengine支持的数据类型,标签的个数最多为6个,名字不能与系统关键词相同,也不能与其他列名相同。如: + +```mysql +create table thermometer (ts timestamp, degree float) +tags (location binary(20), type int) +``` + +上述SQL创建了一个名为thermometer的STable,带有标签location和标签type。 + +为某个采集点创建表时,可以指定其所属的STable以及标签的值,语法如下: + +```mysql +CREATE TABLE USING TAGS (tag_value1,...) +``` + +沿用上面温度计的例子,使用超级表thermometer建立单个温度计数据表的语句如下: + +```mysql +create table t1 using thermometer tags (‘beijing’, 10) +``` + +上述SQL以thermometer为模板,创建了名为t1的表,这张表的Schema就是thermometer的Schema,但标签location值为‘beijing’,标签type值为10。 + +用户可以使用一个STable创建数量无上限的具有不同标签的表,从这个意义上理解,STable就是若干具有相同数据模型,不同标签的表的集合。与普通表一样,用户可以创建、删除、查看超级表STable,大部分适用于普通表的查询操作都可运用到STable上,包括各种聚合和投影选择函数。除此之外,可以设置标签的过滤条件,仅对STbale中部分表进行聚合查询,大大简化应用的开发。 + +TDengine对表的主键(时间戳)建立索引,暂时不提供针对数据模型中其他采集量(比如温度、压力值)的索引。每个数据采集点会采集若干数据记录,但每个采集点的标签仅仅是一条记录,因此数据标签在存储上没有冗余,且整体数据规模有限。TDengine将标签数据与采集的动态数据完全分离存储,而且针对STable的标签建立了高性能内存索引结构,为标签提供全方位的快速操作支持。用户可按照需求对其进行增删改查(Create,Retrieve,Update,Delete,CRUD)操作。 + +STable从属于库,一个STable只属于一个库,但一个库可以有一到多个STable, 一个STable可有多个子表。 + +## 超级表管理 + +- 创建超级表 + + ```mysql + CREATE TABLE ( TIMESTAMP, field_name1 field_type,…) TAGS(tag_name tag_type, …) + ``` + + 与创建表的SQL语法相似。但需指定TAGS字段的名称和类型。 + + 说明: + + 1. TAGS列总长度不能超过512 bytes; + 2. TAGS列的数据类型不能是timestamp和nchar类型; + 3. TAGS列名不能与其他列名相同; + 4. TAGS列名不能为预留关键字. + +- 显示已创建的超级表 + + ```mysql + show stables; + ``` + + 查看数据库内全部STable,及其相关信息,包括STable的名称、创建时间、列数量、标签(TAG)数量、通过该STable建表的数量。 + +- 删除超级表 + + ```mysql + DROP TABLE + ``` + + Note: 删除STable不会级联删除通过STable创建的表;相反删除STable时要求通过该STable创建的表都已经被删除。 + +- 查看属于某STable并满足查询条件的表 + + ```mysql + SELECT TBNAME,[TAG_NAME,…] FROM WHERE <[=|=<|>=|<>] values..> ([AND|OR] …) + ``` + + 查看属于某STable并满足查询条件的表。说明:TBNAME为关键词,显示通过STable建立的子表表名,查询过程中可以使用针对标签的条件。 + + ```mysql + SELECT COUNT(TBNAME) FROM WHERE <[=|=<|>=|<>] values..> ([AND|OR] …) + ``` + + 统计属于某个STable并满足查询条件的子表的数量 + +## 写数据时自动建子表 + +在某些特殊场景中,用户在写数据时并不确定某个设备的表是否存在,此时可使用自动建表语法来实现写入数据时里用超级表定义的表结构自动创建不存在的子表,若该表已存在则不会建立新表。注意:自动建表语句只能自动建立子表而不能建立超级表,这就要求超级表已经被事先定义好。自动建表语法跟insert/import语法非常相似,唯一区别是语句中增加了超级表和标签信息。具体语法如下: + +```mysql +INSERT INTO USING TAGS (, ...) VALUES (field_value, ...) (field_value, ...) ...; +``` + +向表tb_name中插入一条或多条记录,如果tb_name这张表不存在,则会用超级表stb_name定义的表结构以及用户指定的标签值(即tag1_value…)来创建名为tb_name新表,并将用户指定的值写入表中。如果tb_name已经存在,则建表过程会被忽略,系统也不会检查tb_name的标签是否与用户指定的标签值一致,也即不会更新已存在表的标签。 + +```mysql +INSERT INTO USING TAGS (, ...) VALUES (, ...) (, ...) ... USING TAGS(, ...) VALUES (, ...) ...; +``` + +向多张表tb1_name,tb2_name等插入一条或多条记录,并分别指定各自的超级表进行自动建表。 + +## STable中TAG管理 + +除了更新标签的值的操作是针对子表进行,其他所有的标签操作(添加标签、删除标签等)均只能作用于STable,不能对单个子表操作。对STable添加标签以后,依托于该STable建立的所有表将自动增加了一个标签,对于数值型的标签,新增加的标签的默认值是0. + +- 添加新的标签 + + ```mysql + ALTER TABLE ADD TAG + ``` + + 为STable增加一个新的标签,并指定新标签的类型。标签总数不能超过6个。 + +- 删除标签 + + ```mysql + ALTER TABLE DROP TAG + ``` + + 删除超级表的一个标签,从超级表删除某个标签后,该超级表下的所有子表也会自动删除该标签。 + + 说明:第一列标签不能删除,至少需要为STable保留一个标签。 + +- 修改标签名 + + ```mysql + ALTER TABLE CHANGE TAG + ``` + + 修改超级表的标签名,从超级表修改某个标签名后,该超级表下的所有子表也会自动更新该标签名。 + +- 修改子表的标签值 + + ```mysql + ALTER TABLE SET TAG = + ``` + +## STable多表聚合 + +针对所有的通过STable创建的子表进行多表聚合查询,支持按照全部的TAG值进行条件过滤,并可将结果按照TAGS中的值进行聚合,暂不支持针对binary类型的模糊匹配过滤。语法如下: + +```mysql +SELECT function,… + FROM + WHERE <[=|<=|>=|<>] values..> ([AND|OR] …) + INTERVAL (