Update docs.md

9d0a8cf5 · Ganlin Zhao · GitHub · b31326df · 9d0a8cf5
隐藏空白更改
内联并排

Showing with 127 addition and 0 deletion

documentation20/en/05.insert/docs.md documentation20/en/05.insert/docs.md +127 -0

未找到文件。
--- a/documentation20/en/05.insert/docs.md
+++ b/documentation20/en/05.insert/docs.md
@@ -31,6 +31,133 @@ For the SQL INSERT Grammar, please refer to  [Taos SQL insert](https://www.taosd
 - For the same table, if the timestamp of a newly inserted record already exists, the new record will be discarded as default (database option update = 0), that is, the timestamp must be unique in a table. If an application automatically generates records, it is very likely that the generated timestamps will be the same, so the number of records successfully inserted will be smaller than the number of records the application try to insert. If you use UPDATE 1 option when creating a database, inserting a new record with the same timestamp will overwrite the original record.
 - The timestamp of written data must be greater than the current time minus the time of configuration parameter keep. If keep is configured for 3650 days, data older than 3650 days cannot be written. The timestamp for writing data cannot be greater than the current time plus configuration parameter days. If days is configured to 2, data 2 days later than the current time cannot be written.

+## <a class="anchor" id="sql"></a> Data Writing via Schemaless
+**Introduction**
+<br/> In many IoT applications, data collection are often used in intelligent control, business analysis and device monitoring etc. As fast application upgrade and iteration, or hardware adjustment, data collection metrics can change rapidly over time. To provide solutions to such use cases, from version 2.2.0.0, TDengine supports writting data via Schemaless. When using Schemaless, action of pre-creating table before inserting data is no longer needed anymore. Tables, data columns and tags can be created automatically. Schemaless can also add additonal data columns to tables if necessary, to make sure data can be properly storaged into TDengine.
+
+<br/> TDengine C/C++ Connector provides Schemaless API. Please see [Schemaless data writting API](https://www.taosdata.com/en/documentation/connector#schemaless) for detailed data writting format.
+<br/> Super table and corresponding child talbes created via Schemaless are identical to the ones created via SQL, so inserting data into these tables via SQL is also supported. Note that child table names are generated via schemaless are following special rules through tags mapping. Therefore, child table names are usually not meaningful with respect to readability.
+
+**Schemaless writting protocol**
+<br/>TDengine schemaless writting protocol is compatible with InfluxDB's Line Protocol, OpenTSDB's telnet and JSON format protocols. Users need to specify which protocol to use as parameter when writting data using Schemaless API.
+
+For InfluxDB, OpenTSDB data writting protocol format, users can refer to corresponding official documentation for details. Following will give examples of introducing protocol extension from TDengine based on InfluxDB's Line Protocol, allowing users to use Schemaless with more precision.
+
+Schemaless use one line of string literals to represent one data record. (Users can also pass multiple lines to the Schemaless API for batch insertion), the format is as follows:
+```json
+measurement,tag_set field_set timestamp
+```
+
+* measurement is used as the table name。Comma delimeter is used to seperate measurement and tag_set.
+* tag_set represent tag data in key-value pairs. The format is: `<tag_key>=<tag_value>,<tag_key>=<tag_value>`. Comma delimeter is used to seperate multiple tag key-value pairs。Space delimiter is used to seperate tag_set and field_set.
+* field_set represent column data in key-value pairs. The format is similar to tag_set: `<field_key>=<field_value>,<field_key>=<field_value>`，Comma delimeter is used to seperate multiple tag key-value pairs。Space delimiter is used to seperate field set and timestamp.
+* timestamp is the primary key of one data row.
+
+All tag values in tag_set are automatically converted and stored as NCHAR data type in TDengine and no need to be surrounded by double quote("）
+<br/> In Schemaless Line Protocol, data format in field_set need to be self-descriptive in order to convert data to different TDengine data types. For example:
+* Field value surrounded by double quote indicate data is BINARY(32) data types. For example `"abc"`.
+* Field value surrounded by double quote and L letter prefix indicate data is NCHAR(32) data type. For example `L"报错信息"`.
+* Space, equal sign(=), comma(,), double quote(") need to use backslash(\) to escape.
+* Numerical values will convert to different data types according to its suffix:
+
+
+| **ID** | **Suffix** | **Data Type** | **Size(Bytes)** |
+| ------ | ---------- | ------------- | ------ |
+|    1   | NA / f64   |  double       |   8    |
+|    2   | f32        |  float        |   4    |
+|    3   | i8         |  TinyInt      |   1    |
+|    4   | i16        |  SmallInt     |   2    |
+|    5   | i32        |  Int          |   4    |
+|    6   | i64 / i    |  Bigint       |   8    |
+* t, T, true, True, TRUE, f, F, false, False represents BOOLEAN types。
+
+### Schemaless processing logic
+
+Following rules are followed by Schemaless protocol parsing:
+
+<br/>1. For child table name generation, firstly create following string by concatenating measurement and tag key/values strings together.
+```json
+"measurement,tag_key1=tag_value1,tag_key2=tag_value2"
+```
+tag_key1, tag_key2 are not following the orginal order of user input, but sorted according to tag names.
+After MD5 value "md5_val" calculated using the above string, prefix "t_" is added to "md5_val" to form the child table name.
+<br/>2. If super table does not exist, a new super table will be created.
+<br/>3. If child table does not exist, a new child table will be created with its name generated in 1 and 2.
+<br/>4. If columns/tags does not exist, new columns/tags will be created.(Columns/tags can only be added, existing columns/tags can not be deleted with Schemaless)
+<br/>5. If columns/tags are not specified in a line,  values of such columns/tags will be set to NULL.
+<br/>6. For BINARY/NCHAR type columns. If value length exceeds max length of the column, max lengh will be automatically extented to make sure data integrity.
+<br/>7. If child table is already created and tag value is different than previous stored value，old value will be overwritten by new value
+<br/>8. If any error occurs during processing, error code will be return.
+
+**Note：**
+<br/>Schemaless will follow TDengine data structure limitations. For example each table row cannot exceeds 16KB. For detailed TDengine limitations please refer to (https://www.taosdata.com/en/documentation/taos-sql#limitation).
+
+**Timestamp precisions**
+<br/>Following protocols are supported in Schemaless:
+
+| **ID** | **Value**         | **Description** |
+| ---- | ------------------- | ------------ |
+| 1    | SML_LINE_PROTOCOL           |    InfluxDB Line Protocol       |
+| 2    | SML_TELNET_PROTOCOL         |  OpenTSDB telnet Protocol       |
+| 3    | SML_JSON_PROTOCOL           |  OpenTSDB JSON format Protocol  |
+
+<br/>When SML_LINE_PROTOCOL used，users need to indicate timestamp precision through API。Available timestamp resolutions are：<br/>
+
+| **ID** | **Precision Definition **   | **Meaning** |
+| ---- | ----------------------------- | --------- |
+| 1    | TSDB_SML_TIMESTAMP_NOT_CONFIGURED     |   undefined    |
+| 2    | TSDB_SML_TIMESTAMP_HOURS              |   hour         |
+| 3    | TSDB_SML_TIMESTAMP_MINUTES            |   minute       |
+| 4    | TSDB_SML_TIMESTAMP_SECONDS            |   second       |
+| 5    | TSDB_SML_TIMESTAMP_MILLI_SECONDS      |   millisecond  |
+| 6    | TSDB_SML_TIMESTAMP_MICRO_SECONDS      |   microsecon   |
+| 7    | TSDB_SML_TIMESTAMP_NANO_SECONDS       |   nanosecond   |
+
+When SML_TELNET_PROTOCOL or SML_JSON_PROTOCOL used，timestamp precision is determined by how many digits used in timestamp（following OpenTSDB convention），precision from user input will be ignored。
+
+**Schemaless data mapping rules**
+<br/>This section describes how schemaless data are mapped to TDengine structured data。Measurement is mapped to super table name。keys in tag_set/field_set are mapped to tag/column names。For example:
+
+```json
+st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4f64 1626006833639000000
+```
+Above line is mapped to a super table with name "st" with 3 NCHAR type tags ("t1", "t2", "t3") and 5 columns: ts（timestamp），c1 (bigint），c3(binary)，c2 (bool),  c4 (bigint). This is identical to create a super table with the following SQL clause:
+```json
+create stable st (_ts timestamp, c1 bigint, c2 bool, c3 binary(6), c4 bigint) tags(t1 nchar(1), t2 nchar(1), t3 nchar(2))
+```
+**Schemaless data alternation rules**
+<br/>This section describe different data writting scenarios:
+
+When inserting column data with certain type, and the following operations cause the data type to change, an error will be reported by the API:
+
+```json
+st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4    1626006833639000000
+st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4i   1626006833640000000
+```
+For first line of data, c4 column type is declared as DOUBLE with no suffix. However, the second line declared the column type to be BIGINT with suffix "i". Schemaless parsing error will be occured.
+
+When data column is declared as BINARY type, but follow up data inserting requires longer BINARY length for this column, super table schema will be changed accordingly:
+```json
+st,t1=3,t2=4,t3=t3 c1=3i64,c5="pass"     1626006833639000000
+st,t1=3,t2=4,t3=t3 c1=3i64,c5="passit"   1626006833640000000
+```
+In first line c5 column store string "pass" with 4 characters as BINARY(4), but in second line c5 requires 2 more characters for storing binary string "passit", c5 column max length will be extend from BINARY(4) to BINARY(6) to accommodate more characters.
+
+```json
+st,t1=3,t2=4,t3=t3 c1=3i64               1626006833639000000
+st,t1=3,t2=4,t3=t3 c1=3i64,c6="passit"   1626006833640000000
+```
+In above example in second line has one more column c6 with value "passit", compared to the first line. A new column c6 will be added with type BINARY(6).
+
+**Data integrity**
+<br/>TDengine ensure data wrtting through schemaless is idempotent, which means users can call the API multiple times for writting data with errors. However. atomicity is not guaranteed. When writting multiple lines of data as a batch, data might be partially inserted due to errors.
+
+**Error code**
+<br/>If users does not writting data according to protocol syntax, application will get TSDB_CODE_TSC_LINE_SYNTAX_ERROR error code, which indicates error is happened in text. Other generic error codes returned by TDengine can also be obtained through taos_errstr API to get detailed error message. 
+
+**Future enhancement**
+<br/> Currently TDengine only provides clang API for Schemaless. In future versions, API/connectors with more language will be supported, e.g. Java/Go/Python/C# etc. From TDengine v2.3 and later versions, users can also use taosAdaptor to wrtting data via schemaless through REST interface.
+
 ## <a class="anchor" id="prometheus"></a> Data Writing via Prometheus

 As a graduate project of Cloud Native Computing Foundation, [Prometheus](https://www.prometheus.io/) is widely used in the field of performance monitoring and K8S performance monitoring. TDengine provides a simple tool [Bailongma](https://github.com/taosdata/Bailongma), which only needs to be simply configured in Prometheus without any code, and can directly write the data collected by Prometheus into TDengine, then automatically create databases and related table entries in TDengine according to rules. Blog post [Use Docker Container to Quickly Build a Devops Monitoring Demo](https://www.taosdata.com/blog/2020/02/03/1189.html), which is an example of using bailongma to write Prometheus and Telegraf data into TDengine.