13-schemaless.md 11.2 KB
Newer Older
1 2
---
title: Schemaless Writing
3
description: "The Schemaless write method eliminates the need to create super tables/sub tables in advance and automatically creates the storage structure corresponding to the data, as it is written to the interface."
4 5
---

D
danielclow 已提交
6 7
In IoT applications, data is collected for many purposes such as intelligent control, business analysis, device monitoring and so on. Due to changes in business or functional requirements or changes in device hardware, the application logic and even the data collected may change. Schemaless writing automatically creates storage structures for your data as it is being written to TDengine, so that you do not need to create supertables in advance. When necessary, schemaless writing
will automatically add the required columns to ensure that the data written by the user is stored correctly.
8

9
The schemaless writing method creates super tables and their corresponding subtables. These are completely indistinguishable from the super tables and subtables created directly via SQL. You can write data directly to them via SQL statements. Note that the names of tables created by schemaless writing are based on fixed mapping rules for tag values, so they are not explicitly ideographic and they lack readability.
10 11 12

## Schemaless Writing Line Protocol

S
Sean Ely 已提交
13
TDengine's schemaless writing line protocol supports InfluxDB's Line Protocol, OpenTSDB's telnet line protocol, and OpenTSDB's JSON format protocol. However, when using these three protocols, you need to specify in the API the standard of the parsing protocol to be used for the input content.
14 15 16

For the standard writing protocols of InfluxDB and OpenTSDB, please refer to the documentation of each protocol. The following is a description of TDengine's extended protocol, based on InfluxDB's line protocol first. They allow users to control the (super table) schema more granularly.

S
Sean Ely 已提交
17
With the following formatting conventions, schemaless writing uses a single string to express a data row (multiple rows can be passed into the writing API at once to enable bulk writing).
18 19 20 21 22

```json
measurement,tag_set field_set timestamp
```

D
danielclow 已提交
23
where:
24 25

- measurement will be used as the data table name. It will be separated from tag_set by a comma.
D
danielclow 已提交
26 27 28
- `tag_set` will be used as tags, with format like `<tag_key>=<tag_value>,<tag_key>=<tag_value>` Enter a space between `tag_set` and `field_set`.
- `field_set`will be used as data columns, with format like `<field_key>=<field_value>,<field_key>=<field_value>` Enter a space between `field_set` and `timestamp`.
- `timestamp`  is the primary key timestamp corresponding to this row of data
29 30 31 32 33 34 35

All data in tag_set is automatically converted to the NCHAR data type and does not require double quotes (").

In the schemaless writing data line protocol, each data item in the field_set needs to be described with its data type. Let's explain in detail:

- If there are English double quotes on both sides, it indicates the BINARY(32) type. For example, `"abc"`.
- If there are double quotes on both sides and an L prefix, it means NCHAR(32) type. For example, `L"error message"`.
S
Sean Ely 已提交
36
- Spaces, equal signs (=), commas (,), and double quotes (") need to be escaped with a backslash (\\) in front. (All refer to the ASCII character)
37 38 39 40
- Numeric types will be distinguished from data types by the suffix.

| **Serial number** | **Postfix** | **Mapping type** | **Size (bytes)** |
| -------- | -------- | ------------ | -------------- |
D
danielclow 已提交
41 42 43 44 45 46
| 1        | None or f64 | double       | 8              |
| 2        | f32      | float        | 4              |
| 3        | i8/u8       | TinyInt/UTinyInt      | 1              |
| 4        | i16/u16      | SmallInt/USmallInt     | 2              |
| 5        | i32/u32      | Int/UInt          | 4              |
| 6        | i64/i/u64/u  | BigInt/BigInt/UBigInt/UBigInt       | 8              |
47 48 49

- `t`, `T`, `true`, `True`, `TRUE`, `f`, `F`, `false`, and `False` will be handled directly as BOOL types.

D
danielclow 已提交
50 51 52
For example, the following data rows indicate that the t1 label is "3" (NCHAR), the t2 label is "4" (NCHAR), and the t3 label
is "t3" to the super table named `st` labeled "t3" (NCHAR), write c1 column as 3 (BIGINT), c2 column as false (BOOL), c3 column
is "passit" (BINARY), c4 column is 4 (DOUBLE), and the primary key timestamp is 1626006833639000000 in one row.
53 54 55 56 57 58 59 60 61 62 63

```json
st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4f64 1626006833639000000
```

Note that if the wrong case is used when describing the data type suffix, or if the wrong data type is specified for the data, it may cause an error message and cause the data to fail to be written.

## Main processing logic for schemaless writing

Schemaless writes process row data according to the following principles.

S
Sean Ely 已提交
64
1. You can use the following rules to generate the subtable names: first, combine the measurement name and the key and value of the label  into the next string:
65 66 67 68 69 70

```json
"measurement,tag_key1=tag_value1,tag_key2=tag_value2"
```

Note that tag_key1, tag_key2 are not the original order of the tags entered by the user but the result of using the tag names in ascending order of the strings. Therefore, tag_key1 is not the first tag entered in the line protocol.
D
danielclow 已提交
71 72
The string's MD5 hash value "md5_val" is calculated after the ranking is completed. The calculation result is then combined with the string to generate the table name: "t_md5_val". "t_" is a fixed prefix that every table generated by this mapping relationship has.
You can configure smlChildTableName to specify table names, for example, `smlChildTableName=tname`. You can insert `st,tname=cpul,t1=4 c1=3 1626006833639000000` and the cpu1 table will be automatically created. Note that if multiple rows have the same tname but different tag_set values, the tag_set of the first row is used to create the table and the others are ignored.
73 74

2. If the super table obtained by parsing the line protocol does not exist, this super table is created.
D
danielclow 已提交
75
3. If the subtable obtained by the parse line protocol does not exist, Schemaless creates the sub-table according to the subtable name determined in steps 1 or 2.
76
4. If the specified tag or regular column in the data row does not exist, the corresponding tag or regular column is added to the super table (only incremental).
D
danielclow 已提交
77 78
5. If there are some tag columns or regular columns in the super table that are not specified to take values in a data row, then the values of these columns are set to
   NULL.
79
6. For BINARY or NCHAR columns, if the length of the value provided in a data row exceeds the column type limit, the maximum length of characters allowed to be stored in the column is automatically increased (only incremented and not decremented) to ensure complete preservation of the data.
wmmhello's avatar
wmmhello 已提交
80
7. Errors encountered throughout the processing will interrupt the writing process and return an error code.
D
danielclow 已提交
81
8. It is assumed that the order of field_set in a supertable is consistent, meaning that the first record contains all fields and subsequent records store fields in the same order. If the order is not consistent, set smlDataFormat to false. Otherwise, data will be written out of order and a database error will occur.
82 83

:::tip
D
danielclow 已提交
84 85 86
All processing logic of schemaless will still follow TDengine's underlying restrictions on data structures, such as the total length of each row of data cannot exceed
16KB. See [TAOS SQL Boundary Limits](/taos-sql/limit) for specific constraints in this area.

87 88 89 90 91 92 93 94
:::

## Time resolution recognition

Three specified modes are supported in the schemaless writing process, as follows:

| **Serial** | **Value** | **Description** |
| -------- | ------------------- | ------------------------------- |
D
danielclow 已提交
95 96 97
| 1        | SML_LINE_PROTOCOL   | InfluxDB Line Protocol |
| 2        | SML_TELNET_PROTOCOL | OpenTSDB file protocol            |
| 3        | SML_JSON_PROTOCOL   | OpenTSDB JSON protocol                   |
98

D
danielclow 已提交
99
In InfluxDB line protocol mode, you must specify the precision of the input timestamp. Valid precisions are described in the following table.
100

D
danielclow 已提交
101
| **No.** | **Precision**                | **Description**       |
102
| -------- | --------------------------------- | -------------- |
D
danielclow 已提交
103 104 105 106 107 108 109
| 1        | TSDB_SML_TIMESTAMP_NOT_CONFIGURED | Not defined (invalid) |
| 2        | TSDB_SML_TIMESTAMP_HOURS          | Hours           |
| 3        | TSDB_SML_TIMESTAMP_MINUTES        | Minutes           |
| 4        | TSDB_SML_TIMESTAMP_SECONDS        | Seconds             |
| 5        | TSDB_SML_TIMESTAMP_MILLI_SECONDS  | Milliseconds           |
| 6        | TSDB_SML_TIMESTAMP_MICRO_SECONDS  | Microseconds           |
| 7        | TSDB_SML_TIMESTAMP_NANO_SECONDS   | Nanoseconds           |
110

D
danielclow 已提交
111
In OpenTSDB file and JSON protocol modes, the precision of the timestamp is determined from its length in the standard OpenTSDB manner. User input is ignored.
112

D
danielclow 已提交
113
## Data Model Mapping
114

D
danielclow 已提交
115 116
This section describes how data in line protocol is mapped to a schema. The data measurement in each line is mapped to a
supertable name. The tag name in tag_set is the tag name in the schema, and the name in field_set is the column name in the schema. The following example shows how data is mapped:
117 118 119 120 121

```json
st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4f64 1626006833639000000
```

D
danielclow 已提交
122
This row is mapped to a supertable: `st` contains three NCHAR tags: t1, t2, and t3. Five columns are created: ts (timestamp), c1 (bigint), c3 (binary), c2 (bool), and c4 (bigint). The following SQL statement is generated:
123 124 125 126 127

```json
create stable st (_ts timestamp, c1 bigint, c2 bool, c3 binary(6), c4 bigint) tags(t1 nchar(1), t2 nchar(1), t3 nchar(2))
```

D
danielclow 已提交
128
## Processing Schema Changes
129

D
danielclow 已提交
130
This section describes the impact on the schema caused by different data being written.
131

D
danielclow 已提交
132
If you use line protocol to write to a specific tag field and then later change the field type, a schema error will ocur. This triggers an error on the write API. This is shown as follows:
133 134

```json
D
danielclow 已提交
135 136
st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4    1626006833639000000
st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4i   1626006833640000000
137 138
```

D
danielclow 已提交
139
The first row defines c4 as a double. However, in the second row, the suffix indicates that the value of c4 is a bigint. This causes schemaless writing to throw an error.
140

D
danielclow 已提交
141
An error also occurs if data input into a binary column exceeds the defined length of the column.
142 143

```json
D
danielclow 已提交
144 145
st,t1=3,t2=4,t3=t3 c1=3i64,c5="pass"     1626006833639000000
st,t1=3,t2=4,t3=t3 c1=3i64,c5="passit"   1626006833640000000
146 147
```

D
danielclow 已提交
148
The first row defines c5 as a binary(4). but the second row writes 6 bytes to it. This means that the length of the binary column must be expanded to contain the data.
149 150

```json
D
danielclow 已提交
151 152
st,t1=3,t2=4,t3=t3 c1=3i64               1626006833639000000
st,t1=3,t2=4,t3=t3 c1=3i64,c6="passit"   1626006833640000000
153 154
```

D
danielclow 已提交
155
The preceding data includes a new entry, c6, with type binary(6). When this occurs, a new column c6 with type binary(6) is added automatically.
156

D
danielclow 已提交
157
## Write Integrity
158

D
danielclow 已提交
159
TDengine guarantees the idempotency of data writes. This means that you can repeatedly call the API to perform write operations with bad data. However, TDengine does not guarantee the atomicity of multi-row writes. In a multi-row write, some data may be written successfully and other data unsuccessfully.
160

D
danielclow 已提交
161
##: Error Codes
162

D
danielclow 已提交
163 164 165
The TSDB_CODE_TSC_LINE_SYNTAX_ERROR indicates an error in the schemaless writing component.
This error occurs when writing text. For other errors, schemaless writing uses the standard TDengine error codes
found in taos_errstr.