03-immigrate.md 32.1 KB
Newer Older
D
dingbo 已提交
1
---
2 3
sidebar_label: OpenTSDB Migration to TDengine
title: Best Practices for Migrating OpenTSDB Applications to TDengine
D
dingbo 已提交
4 5
---

6
As a distributed, scalable, HBase-based distributed time-series database software, thanks to its first-mover advantage, OpenTSDB has been introduced and widely used in DevOps by people. However, using new technologies like cloud computing, microservices, and containerization technology with rapid development. Enterprise-level services are becoming more and more diverse. The architecture is becoming more complex.
D
dingbo 已提交
7

8 9
From this situation, it increasingly plagues to use of OpenTSDB as a DevOps backend storage for monitoring by performance issues and delayed feature upgrades. The resulting increase in application deployment costs and reduced operational efficiency.
These problems are becoming increasingly severe as the system scales up.
D
dingbo 已提交
10

11
To meet the fast-growing IoT big data market and technical needs, TAOSData developed an innovative big-data processing product, **TDengine**.
D
dingbo 已提交
12

13
After learning the advantages of many traditional relational databases and NoSQL databases, stream computing engines, and message queues, TDengine has its unique benefits in time-series big data processing. TDengine can effectively solve the problems currently encountered by OpenTSDB.
D
dingbo 已提交
14

15
Compared with OpenTSDB, TDengine has the following distinctive features.
D
dingbo 已提交
16

17 18 19 20 21 22
- Performance of data writing and querying far exceeds that of OpenTSDB.
- Efficient compression mechanism for time-series data, which compresses less than 1/5 of the storage space on disk.
- The installation and deployment are straightforward. A single installation package can complete the installation and deployment and does not rely on other third-party software. The entire installation and deployment process in a few seconds;
- The built-in functions cover all of OpenTSDB's query functions. And support more time-series data query functions, scalar functions, and aggregation functions. And support advanced query functions such as multiple time-window aggregations, join query, expression operation, multiple group aggregation, user-defined sorting, and user-defined functions. Adopting SQL-like syntax rules is more straightforward and has no learning cost.
- Supports up to 128 tags, with a total tag length of 16 KB.
- In addition to the REST interface, it also provides interfaces to Java, Python, C, Rust, Go, C# and other languages. Its supports a variety of enterprise-class standard connector protocols such as JDBC.
D
dingbo 已提交
23

24
If we migrate the applications originally running on OpenTSDB to TDengine, we will effectively reduce the compute and storage resource consumption and the number of deployed servers. And will also significantly reduce the operation and maintenance costs, making operation and maintenance management more straightforward and more accessible, and considerably reducing the total cost of ownership. Like OpenTSDB, TDengine has also been open-sourced, including the stand-alone version and the cluster version source code. So there is no need to be concerned about the vendor-lock problem.
D
dingbo 已提交
25

26
We will explain how to migrate OpenTSDB applications to TDengine quickly, securely, and reliably without coding, using the most typical DevOps scenarios. Subsequent chapters will go into more depth to facilitate migration for non-DevOps systems.
D
dingbo 已提交
27

28
## DevOps Application Quick Migration
D
dingbo 已提交
29

30
### 1. Typical Application Scenarios
D
dingbo 已提交
31

32
The following figure (Figure 1) shows the system's overall architecture for a typical DevOps application scenario.
D
dingbo 已提交
33

34 35
**Figure 1. Typical architecture in a DevOps scenario**
Figure 1. [IT-DevOps-Solutions-Immigrate-OpenTSDB-Arch](/img/IT-DevOps-Solutions-Immigrate-OpenTSDB-Arch.jpg "Figure 1. Typical architecture in a DevOps scenario")
D
dingbo 已提交
36

37
In this application scenario, there are Agent tools deployed in the application environment to collect machine metrics, network metrics, and application metrics. Data collectors to aggregate information collected by agents, systems for persistent data storage and management, and tools for monitoring data visualization (e.g., Grafana, etc.).
D
dingbo 已提交
38

39
The agents deployed in the application nodes are responsible for providing operational metrics from different sources to collectd/Statsd. And collectd/StatsD is accountable for pushing the aggregated data to the OpenTSDB cluster system and then visualizing the data using the visualization kanban board software, Grafana.
D
dingbo 已提交
40

41
### 2. Migration Services
D
dingbo 已提交
42

43
- **TDengine installation and deployment**
D
dingbo 已提交
44

45
First of all, please install TDengine. Download the latest stable version of TDengine from the official website and install it. For help with using various installation packages, please refer to the blog ["Installation and Uninstallation of TDengine Multiple Installation Packages"](https://www.taosdata.com/blog/2019/08/09/566.html).
D
dingbo 已提交
46

47
Note that once the installation is complete, do not start the `taosd` service immediately, but after properly configuring the parameters.
D
dingbo 已提交
48

49
- **Adjusting the data collector configuration**
D
dingbo 已提交
50

51
TDengine version 2.4 and later version includes `taosAdapter`. taosAdapter is a stateless, rapidly elastic, and scalable component. taosAdapter supports Influxdb's Line Protocol and OpenTSDB's telnet/JSON writing protocol specification, providing rich data access capabilities, effectively saving user migration costs and reducing the difficulty of user migration.
D
dingbo 已提交
52

53 54 55 56 57
Users can flexibly deploy taosAdapter instances according to their requirements to rapidly improve the throughput of data writes in conjunction with the needs of scenarios and provide guarantees for data writes in different application scenarios.

Through taosAdapter, users can directly push the data collected by `collectd` or `StatsD` to TDengine to achieve seamless migration of application scenarios, which is very easy and convenient. taosAdapter also supports Telegraf, Icinga, TCollector, and node_exporter data. For more details, please refer to [taosAdapter](/reference/taosadapter/).

If using collectd, modify the configuration file in its default location `/etc/collectd/collectd.conf` to point to the IP address and port of the node where to deploy taosAdapter. For example, assuming the taosAdapter IP address is 192.168.1.130 and port 6046, configure it as follows.
D
dingbo 已提交
58 59 60 61 62 63 64 65 66 67 68

```html
LoadPlugin write_tsdb
<Plugin write_tsdb>
  <Node>
    Host "192.168.1.130" Port "6046" HostTags "status=production" StoreRates
    false AlwaysAppendDS false
  </Node>
</Plugin>
```

69
You can use collectd and push the data to taosAdapter utilizing the push to OpenTSDB plugin. taosAdapter will call the API to write the data to TDengine, thus completing the writing of the data. If you are using StatsD, adjust the profile information accordingly.
D
dingbo 已提交
70

71
- **Tuning the Dashboard system**
D
dingbo 已提交
72

73
After writing the data to TDengine properly, you can adapt Grafana to visualize the data written to TDengine. To obtain and use the Grafana plugin provided by TDengine, please refer to [Links to other tools](/third-party/grafana).
D
dingbo 已提交
74

75
TDengine provides two sets of Dashboard templates by default, and users only need to import the templates from the Grafana directory into Grafana to activate their use.
D
dingbo 已提交
76

77 78
**Importing Grafana Templates** Figure 2.
! [](/img/IT-DevOps-Solutions-Immigrate-OpenTSDB-Dashboard.jpg "Figure 2. Importing a Grafana Template")
D
dingbo 已提交
79

80
After the above steps, you completed the migration to replace OpenTSDB with TDengine. You can see that the whole process is straightforward, there is no need to write any code, and only some configuration files need to be adjusted to meet the migration work.
D
dingbo 已提交
81

82
### 3. Post-migration architecture
D
dingbo 已提交
83

84
After completing the migration, the figure below (Figure 3) shows the system's overall architecture. The whole process of the acquisition side, the data writing, and the monitoring and presentation side are all kept stable, except for a few configuration adjustments, which do not involve any critical changes or alterations. OpenTSDB to TDengine migration action, using TDengine more powerful processing power and query performance.
D
dingbo 已提交
85

86
In most DevOps scenarios, if you have a small OpenTSDB cluster (3 or fewer nodes) for providing the storage layer of DevOps and rely on OpenTSDB to give a data persistence layer and query capabilities,  you can safely replace OpenTSDB with TDengine. TDengine will save more compute and storage resources. With the same compute resource allocation, a single TDengine can meet the service capacity provided by 3 to 5 OpenTSDB nodes. If the scale is more prominent, then TDengine clustering is required.
D
dingbo 已提交
87

88
Suppose your application is particularly complex, or the application domain is not a DevOps scenario. You can continue reading subsequent chapters for a more comprehensive and in-depth look at the advanced topics of migrating an OpenTSDB application to TDengine.
D
dingbo 已提交
89

90 91
**Figure 3. System architecture after migration**
! [IT-DevOps-Solutions-Immigrate-TDengine-Arch](/img/IT-DevOps-Solutions-Immigrate-TDengine-Arch.jpg "Figure 3. System architecture after migration completion")
D
dingbo 已提交
92

93
## Migration evaluation and strategy for other scenarios
D
dingbo 已提交
94

95
### 1. Differences between TDengine and OpenTSDB
D
dingbo 已提交
96

97
This chapter describes the differences between OpenTSDB and TDengine at the system functionality level. After reading this chapter, you can fully evaluate whether you can migrate some complex OpenTSDB-based applications to TDengine, and what you should pay attention to after migration.
D
dingbo 已提交
98

sangshuduo's avatar
sangshuduo 已提交
99
TDengine currently only supports Grafana for visual kanban rendering, so if your application uses front-end kanban boards other than Grafana (e.g., [TSDash](https://github.com/facebook/tsdash), [Status Wolf](https://github.com/box/StatusWolf), etc.). You cannot directly migrate those front-end kanbans to TDengine, and the front-end kanban will need to be ported to Grafana to work correctly.
D
dingbo 已提交
100

101
TDengine version 2.3.0.x only supports collectd and StatsD as data collection aggregation software but will provide more data collection aggregation software in the future. If you use other data aggregators on the collection side, your application needs to be ported to these two data aggregation systems to write data correctly.
102
In addition to the two data aggregator software protocols mentioned above, TDengine also supports writing data directly via InfluxDB's line protocol and OpenTSDB's data writing protocol, JSON format. You can rewrite the logic on the data push side to write data using the line protocols supported by TDengine.
D
dingbo 已提交
103

104
In addition, if your application uses the following features of OpenTSDB, you need to understand the following considerations before migrating your application to TDengine.
D
dingbo 已提交
105

106 107 108 109 110 111
1. `/api/stats`: If your application uses this feature to monitor the service status of OpenTSDB, and you have built the relevant logic to link the processing in your application, then this part of the status reading and fetching logic needs to be re-adapted to TDengine. TDengine provides a new mechanism for handling cluster state monitoring to meet the monitoring and maintenance needs of your application.
2. `/api/tree`: If you rely on this feature of OpenTSDB for the hierarchical organization and maintenance of timelines, you cannot migrate it directly to TDengine, which uses a database -> super table -> sub-table hierarchy to organize and maintain timelines, with all timelines belonging to the same super table in the same system hierarchy, but it is possible to simulate a logical multi-level structure of the application through the unique construction of different tag values.
3. `Rollup And PreAggregates`: The use of Rollup and PreAggregates requires the application to decide where to access the Rollup results and, in some scenarios, to access the actual results. The opacity of this structure makes the application processing logic extraordinarily complex and not portable at all. We think this strategy is a compromise when the time-series database does not.
TDengine does not support automatic downsampling of multiple timelines and preaggregation (for a range of periods) for the time being. Still, thanks to its high-performance query processing logic can provide very high-performance query responses without relying on Rollup and preaggregation (for a range of periods), making your application query processing logic much more straightforward.
The logic is much simpler.
4. `Rate`: TDengine provides two functions to calculate the rate of change of values, namely `Derivative` (the result is consistent with the Derivative behavior of InfluxDB) and `IRate` (the result is compatible with the IRate function in Prometheus). However, the results of these two functions are slightly different from Rate, but the functions are more powerful overall. In addition, TDengine supports all the calculation functions provided by OpenTSDB, and TDengine's query functions are much more potent than those supported by OpenTSDB, which can significantly simplify the processing logic of your application.
D
dingbo 已提交
112

113
Through the above introduction, I believe you should be able to understand the changes brought about by the migration of OpenTSDB to TDengine. And this information will also help you correctly determine whether you would migrate your application to TDengine to experience the powerful and convenient time-series data processing capability provided by TDengine.
D
dingbo 已提交
114

115
### 2. Migration strategy suggestion
D
dingbo 已提交
116

117 118
First, the OpenTSDB-based system migration involves data schema design, system scale estimation, and data write end transformation, data streaming, and application adaptation; after that, the two systems will run in parallel for a while and then migrate the historical data to TDengine. Of course, if your application has some functions that strongly depend on the above OpenTSDB features and you do not want to stop using them, you can migrate the historical data to TDengine.
You can consider keeping the original OpenTSDB system running while starting TDengine to provide the primary services.
D
dingbo 已提交
119

120
## Data model design
D
dingbo 已提交
121

122
On the one hand, TDengine requires a strict schema definition for its incoming data. On the other hand, the data model of TDengine is richer than that of OpenTSDB, and the multi-valued model is compatible with all single-valued model building requirements.
D
dingbo 已提交
123

124
Let us now assume a DevOps scenario where we use collectd to collect the underlying metrics of the device, including memory, swap, disk, etc. The schema in OpenTSDB is as follows.
D
dingbo 已提交
125

126
| metric | value name | type | tag1 | tag2 | tag3 | tag4 | tag5 |
D
dingbo 已提交
127
| ---- | -------------- | ------ | ------ | ---- | ----------- | -------------------- | --------- | ------ |
128 129 130
| 1 | memory | value | double | host | memory_type | memory_type_instance | source | n/a |
| 2 | swap | value | double | host | swap_type | swap_type_instance | source | n/a |
| 3 | disk | value | double | host | disk_point | disk_instance | disk_type | source |
D
dingbo 已提交
131

132 133
TDengine requires the data stored to have a data schema, i.e., you need to create a super table and specify the schema of the super table before writing the data. For data schema creation, you have two ways to do this: 1) Take advantage of TDengine's native data writing support for OpenTSDB by calling the TDengine API to write (text line or JSON format)
and automate the creation of single-value models. This approach does not require significant adjustments to the data writing application, nor does it require converting the written data format.
D
dingbo 已提交
134

135
At the C level, TDengine provides the `taos_schemaless_insert()` function to write data in OpenTSDB format directly (in early version this function was named `taos_insert_lines()`). Please refer to the sample code `schemaless.c` in the installation package directory as reference.
D
dingbo 已提交
136

137
(2) based on a complete understanding of TDengine's data model,  to establish the mapping relationship between OpenTSDB and TDengine's data model adjustment manually. Considering that OpenTSDB is a single-value mapping model, recommended using the single-value model in TDengine. TDengine can support both multi-value and single-value models.
D
dingbo 已提交
138

139
- **Single-valued model**.
D
dingbo 已提交
140

141
The steps are as follows: use the name of the metrics as the name of the TDengine super table, which build with two basic data columns - timestamp and value, and the label of the super table is equivalent to the label information of the metrics, and the number of labels is equal to the number of labels of the metrics. The names of sub-tables are named with fixed rules: `metric + '_' + tags1_value + '_' + tag2_value + '_' + tag3_value ...` as the sub-table name.
D
dingbo 已提交
142

143
Create 3 super tables in TDengine.
D
dingbo 已提交
144 145

```sql
146
create stable memory(ts timestamp, val float) tags(host binary(12), memory_type binary(20), memory_type_instance binary(20), source binary(20)) ;
D
dingbo 已提交
147 148 149 150
create stable swap(ts timestamp, val double) tags(host binary(12), swap_type binary(20), swap_type_binary binary(20), source binary(20));
create stable disk(ts timestamp, val double) tags(host binary(12), disk_point binary(20), disk_instance binary(20), disk_type binary(20), source binary(20));
```

151
For sub-tables use dynamic table creation as shown below.
D
dingbo 已提交
152 153

```sql
154
insert into memory_vm130_memory_buffered_collectd using memory tags('vm130', 'memory', ' buffer', 'collectd') values(1632979445, 3.0656);
D
dingbo 已提交
155 156
```

157
The final system will have about 340 sub-tables and three super-tables. Note that if the use of concatenated tagged values causes the sub-table names to exceed the system limit (191 bytes), then some encoding (e.g., MD5) needs to be used to convert them to an acceptable length.
D
dingbo 已提交
158

159
- **Multi-value model**
D
dingbo 已提交
160

161
Suppose you want to take advantage of TDengine's multi-value modeling capabilities. In that case, you need first to meet the requirements that different collection quantities have the same collection frequency and can reach the **data write side simultaneously via a message queue**, thus ensuring writing multiple metrics at once using SQL statements. The metric's name is used as the name of the super table to create a multi-column model of data that has the same collection frequency and can arrive simultaneously. The names of the sub-tables are named using a fixed rule. Each of the above metrics contains only one measurement value, so converting it into a multi-value model is impossible.
D
dingbo 已提交
162

163
## Data triage and application adaptation
D
dingbo 已提交
164

165
Subscribe data from the message queue and start the adapted writer to write the data.
D
dingbo 已提交
166

167
After writing the data starts for a while, you can use SQL statements to check whether the amount of data written meets the expected writing requirements. Use the following SQL statement to count the amount of data.
D
dingbo 已提交
168 169 170 171 172

```sql
select count(*) from memory
```

173
After completing the query, if the data written does not differ from what is expected and there are no abnormal error messages from the writing program itself, you can confirm that the written data is complete and valid.
D
dingbo 已提交
174

175
TDengine does not support querying, or data fetching using the OpenTSDB query syntax but does provide a counterpart for each of the OpenTSDB queries. The corresponding query processing can be adapted and applied in a manner obtained by examining Appendix 1. To fully understand the types of queries supported by TDengine, refer to the TDengine user manual.
D
dingbo 已提交
176

177
TDengine supports the standard JDBC 3.0 interface for manipulating databases, but you can also use other types of high-level language connectors for querying and reading data to suit your application. Please read the user manual for specific operations and usage.
D
dingbo 已提交
178

179
## Historical Data Migration
D
dingbo 已提交
180

181
### 1. Use the tool to migrate data automatically
D
dingbo 已提交
182

183
To facilitate historical data migration, we provide a plug-in for the data synchronization tool DataX, which can automatically write data into TDengine.The automatic data migration of DataX can only support the data migration process of a single value model.
D
dingbo 已提交
184

185
For the specific usage of DataX and how to use DataX to write data to TDengine, please refer to [DataX-based TDengine Data Migration Tool](https://www.taosdata.com/blog/2021/10/26/3156.html).
D
dingbo 已提交
186

187
After migrating via DataX, we found that we can significantly improve the efficiency of migrating historical data by starting multiple processes and migrating numerous metrics simultaneously. The following are some records of the migration process. I wish to use these for application migration as a reference.
D
dingbo 已提交
188

189 190 191 192 193 194 195
| Number of datax instances (number of concurrent processes) | Migration record speed (pieces/second) |
| ----------------------------- | ------------------- -- |
| 1 | About 139,000 |
| 2 | About 218,000 |
| 3 | About 249,000 |
| 5 | About 295,000 |
| 10 | About 330,000 |
D
dingbo 已提交
196

197
<br/> (Note: The test data comes from a single-node Intel(R) Core(TM) i7-10700 CPU@2.90GHz 16-core 64G hardware device, the channel and batchSize are 8 and 1000 respectively, and each record contains 10 tags)
D
dingbo 已提交
198

199
### 2. Manual data migration
D
dingbo 已提交
200

201
Suppose you need to use the multi-value model for data writing. In that case, you need to develop a tool to export data from OpenTSDB, confirm which timelines can be merged and imported into the same timeline, and then pass the time to import simultaneously through the SQL statement—written to the database.
D
dingbo 已提交
202

203
Manual migration of data requires attention to the following two issues:
D
dingbo 已提交
204

205
1) When storing the exported data on the disk, the disk needs to have enough storage space to accommodate the exported data files fully. Adopting the partial import mode to avoid the shortage of disk file storage after the total amount of data is exported. Preferentially export the timelines belonging to the same super table. Then the exported data files are imported into the TDengine system.
D
dingbo 已提交
206

207
2) Under the full load of the system, if there are enough remaining computing and IO resources, establish a multi-threaded importing to maximize the efficiency of data migration. Considering the vast load that data parsing brings to the CPU, it is necessary to control the maximum number of parallel tasks to avoid the overall overload of the system triggered by importing historical data.
D
dingbo 已提交
208

209
Due to the ease of operation of TDengine itself, there is no need to perform index maintenance and data format change processing in the entire process. The whole process only needs to be executed sequentially.
D
dingbo 已提交
210

211
When wholly importing the historical data into TDengine, the two systems run simultaneously and then switch the query request to TDengine to achieve seamless application switching.
D
dingbo 已提交
212

213
## Appendix 1: OpenTSDB query function correspondence table
D
dingbo 已提交
214

215
### Avg
D
dingbo 已提交
216

217
Equivalent function: avg
D
dingbo 已提交
218

219
Example:
D
dingbo 已提交
220

221
```sql
D
dingbo 已提交
222
SELECT avg(val) FROM (SELECT first(val) FROM super_table WHERE ts >= startTime and ts <= endTime INTERVAL(20s) Fill(linear)) INTERVAL(20s)
223
```
D
dingbo 已提交
224

225
Remark:
D
dingbo 已提交
226

227 228 229 230
1. The value in Interval needs to be the same as the interval value in the outer query.
2. The interpolation processing in TDengine needs to use subqueries to assist in the completion. As shown above, it is enough to specify the interpolation type in the inner query. Since the interpolation of the values ​​in OpenTSDB uses linear interpolation, use fill( in the interpolation clause. linear) to declare the interpolation type. The following functions with the exact interpolation calculation requirements are processed by this method.
3. The parameter 20s in Interval indicates that the inner query will generate results according to a time window of 20 seconds. In an actual query, it needs to adjust to the time interval between different records. It ensures that producing interpolation results equivalent to the original data.
4. Due to the particular interpolation strategy and mechanism of OpenTSDB, the method of the first interpolation and then calculation in the aggregate query (Aggregate) makes the calculation results impossible to be utterly consistent with TDengine. But in the case of downsampling (Downsample), TDengine and OpenTSDB can obtain consistent results (since OpenTSDB performs aggregation and downsampling queries).
D
dingbo 已提交
231

232
### Count
D
dingbo 已提交
233

234
Equivalent function: count
D
dingbo 已提交
235

236
Example:
D
dingbo 已提交
237

238
```sql
D
dingbo 已提交
239
select count(\*) from super_table_name;
240
```
D
dingbo 已提交
241

242
### Dev
D
dingbo 已提交
243

244
Equivalent function: stddev
D
dingbo 已提交
245

246
Example:
D
dingbo 已提交
247

248
```sql
D
dingbo 已提交
249
Select stddev(val) from table_name
250
```
D
dingbo 已提交
251

252
### Estimated percentiles
D
dingbo 已提交
253

254
Equivalent function: apercentile
D
dingbo 已提交
255

256
Example:
D
dingbo 已提交
257

258
```sql
D
dingbo 已提交
259
Select apercentile(col1, 50, t-digest) from table_name
260
```
D
dingbo 已提交
261

262
Remark:
D
dingbo 已提交
263

264
1. During the approximate query processing, OpenTSDB uses the t-digest algorithm by default, so in order to obtain the same calculation result, the algorithm used needs to be specified in the `apercentile()` function. TDengine can support two different approximation processing algorithms, declared by "default" and "t-digest" respectively.
D
dingbo 已提交
265

266
### First
D
dingbo 已提交
267

268
Equivalent function: first
D
dingbo 已提交
269

270
Example:
D
dingbo 已提交
271

272
```sql
D
dingbo 已提交
273
Select first(col1) from table_name
274
```
D
dingbo 已提交
275

276
### Last
D
dingbo 已提交
277

278
Equivalent function: last
D
dingbo 已提交
279

280
Example:
D
dingbo 已提交
281

282
```sql
D
dingbo 已提交
283
Select last(col1) from table_name
284
```
D
dingbo 已提交
285

286
### Max
D
dingbo 已提交
287

288
Equivalent function: max
D
dingbo 已提交
289

290
Example:
D
dingbo 已提交
291

292
```sql
D
dingbo 已提交
293
Select max(value) from (select first(val) value from table_name interval(10s) fill(linear)) interval(10s)
294
```
D
dingbo 已提交
295

296
Note: The Max function requires interpolation for the reasons described above.
D
dingbo 已提交
297

298
### Min
D
dingbo 已提交
299

300
Equivalent function: min
D
dingbo 已提交
301

302
Example:
D
dingbo 已提交
303

304
```sql
D
dingbo 已提交
305
Select min(value) from (select first(val) value from table_name interval(10s) fill(linear)) interval(10s);
306
```
D
dingbo 已提交
307

308
### MinMax
D
dingbo 已提交
309

310
Equivalent function: max
D
dingbo 已提交
311

312
```sql
D
dingbo 已提交
313
Select max(val) from table_name
314
```
D
dingbo 已提交
315

316
Note: This function has no interpolation requirements, so it can be directly calculated.
D
dingbo 已提交
317

318
### MimMin
D
dingbo 已提交
319

320
Equivalent function: min
D
dingbo 已提交
321

322
```sql
D
dingbo 已提交
323
Select min(val) from table_name
324
```
D
dingbo 已提交
325

326
Note: This function has no interpolation requirements, so it can be directly calculated.
D
dingbo 已提交
327

328
### Percentile
D
dingbo 已提交
329

330
Equivalent function: percentile
D
dingbo 已提交
331

332
Remark:
D
dingbo 已提交
333

334
### Sum
D
dingbo 已提交
335

336
Equivalent function: sum
D
dingbo 已提交
337

338
```sql
D
dingbo 已提交
339
Select max(value) from (select first(val) value from table_name interval(10s) fill(linear)) interval(10s)
340
```
D
dingbo 已提交
341

342
Note: This function has no interpolation requirements, so it can be directly calculated.
D
dingbo 已提交
343

344
### Zimsum
D
dingbo 已提交
345

346
Equivalent function: sum
D
dingbo 已提交
347

348
```sql
D
dingbo 已提交
349
Select sum(val) from table_name
350
```
D
dingbo 已提交
351

352
Note: This function has no interpolation requirements, so it can be directly calculated.
D
dingbo 已提交
353

354
Complete example:
D
dingbo 已提交
355

356 357
````json
// OpenTSDB query JSON
D
dingbo 已提交
358
query = {
359 360 361 362 363
"start": 1510560000,
"end": 1515000009,
"queries": [{
"aggregator": "count",
"metric": "cpu.usage_user",
D
dingbo 已提交
364 365 366
}]
}

367
// Equivalent query SQL:
D
dingbo 已提交
368 369 370
SELECT count(*)
FROM `cpu.usage_user`
WHERE ts>=1510560000 AND ts<=1515000009
371
````
D
dingbo 已提交
372

373
## Appendix 2: Resource Estimation Methodology
D
dingbo 已提交
374

375
### Data generation environment
D
dingbo 已提交
376

377
We still use the hypothetical environment from Chapter 4. There are three measurements. Respectively: the data writing rate of temperature and humidity is one record every 5 seconds, and the timeline is 100,000. The writing rate of air pollution is one record every 10 seconds, the timeline is 10,000, and the query request frequency is 500 QPS.
D
dingbo 已提交
378

379
### Storage resource estimation
D
dingbo 已提交
380

381 382
Assuming that the number of sensor devices that generate data and need to be stored is `n`, the frequency of data generation is `t` per second, and the length of each record is `L` bytes, the scale of data generated per day is `n * t * L` bytes. Assuming the compression ratio is `C`, the daily data size is `(n * t * L)/C` bytes. The storage resources are estimated to accommodate the data scale for 1.5 years. In the production environment, the compression ratio C of TDengine is generally between 5 and 7.
With additional 20% ​​redundancy, you can calculate the required storage resources:
D
dingbo 已提交
383 384

```matlab
385 386
(n * t * L) * (365 * 1.5) * (1+20%)/C
````
D
dingbo 已提交
387

388
Combined with the above calculation formula, bring the parameters into the formula, and the raw data scale generated every year is 11.8TB without considering the label information. Note that since tag information is associated with each timeline in TDengine, not every record. The scale of the amount of data to be recorded is somewhat reduced relative to the generated data, and this part of label data can be ignored as a whole. Assuming a compression ratio of 5, the size of the retained data ends up being 2.56 TB.
D
dingbo 已提交
389

390
### Storage Device Selection Considerations
D
dingbo 已提交
391

392
The hard disk should be capable of better random read performance. Considering using an SSD as much as possible is a better choice. A disk with better random read performance is a great help to improve the system's query performance and improve the query response performance as a whole system. To obtain better query performance, the performance index of the single-threaded random read IOPS of the hard disk device should not be lower than 1000, and it is better to reach 5000 IOPS or more. Recommend to use `fio` utility software to evaluate the running performance (please refer to Appendix 1 for specific usage) for the random IO read of the current device to confirm whether it can meet the requirements of random read of large files.
D
dingbo 已提交
393

394
Hard disk writing performance has little effect on TDengine. The TDengine writing process adopts the append write mode, so as long as it has good sequential write performance, both SAS hard disks and SSDs in the general sense can well meet TDengine's requirements for disk write performance.
D
dingbo 已提交
395

396
### Computational resource estimates
D
dingbo 已提交
397

398
Due to the particularity of IoT data, after the frequency of data generation is consistent, the writing process of TDengine maintains a relatively fixed amount of resource consumption (computing and storage). According to the [TDengine Operation and Maintenance Guide](/operation/) description, the system consumes less than 1 CPU core at 22,000 writes per second.
D
dingbo 已提交
399

400
In estimating the CPU resources consumed by the query, assuming that the application requires the database to provide 10,000 QPS, the CPU time consumed by each query is about 1 ms. The query provided by each core per second is 1,000 QPS, which satisfies 10,000 QPS. The query request requires at least 10 cores. For the system as a whole system to have less than 50% CPU load, the entire cluster needs twice as many as 10 cores or 20 cores.
D
dingbo 已提交
401

402
### Memory resource estimation
D
dingbo 已提交
403

404
The database allocates 16MB\*3 buffer memory for each Vnode by default. If the cluster system includes 22 CPU cores, TDengine will create 22 Vnodes (virtual nodes) by default. Each Vnode contains 1000 tables, which can accommodate all the tables. Then it takes about 1.5 hours to write a block, which triggers the drop, and no adjustment is required. A total of 22 Vnodes require about 1GB of memory cache. Considering the memory needed for the query, assuming that the memory overhead of each query is about 50MB, the memory required for 500 queries concurrently is about 25GB.
D
dingbo 已提交
405

406
In summary, using a single 16-core 32GB machine or a cluster of 2 8-core 16GB machines is enough.
D
dingbo 已提交
407

408
## Appendix 3: Cluster Deployment and Startup
D
dingbo 已提交
409

410
TDengine provides a wealth of help documents to explain many aspects of cluster installation and deployment. Here is the list of corresponding document for your reference.
D
dingbo 已提交
411

412
### Cluster Deployment
D
dingbo 已提交
413

414
The first is TDengine installation. Download the latest stable version of TDengine from the official website, and install it. Please refer to the blog ["Installation and Uninstallation of Various Installation Packages of TDengine"](https://www.taosdata.com/blog/2019/08/09/566.html) for the various installation package formats.
D
dingbo 已提交
415

416
Note that once the installation is complete, do not immediately start the `taosd` service, but start it after correctly configuring the parameters.
D
dingbo 已提交
417

418
### Set running parameters and start the service
D
dingbo 已提交
419

420
To ensure that the system can obtain the necessary information for regular operation. Please set the following vital parameters correctly on the server:
D
dingbo 已提交
421

422
FQDN, firstEp, secondEP, dataDir, logDir, tmpDir, serverPort. For the specific meaning and setting requirements of each parameter, please refer to the document "[TDengine Cluster Installation and Management](/cluster/)"
D
dingbo 已提交
423

424
Follow the same steps to set parameters on the nodes that need running, start the taosd service, and then add Dnodes to the cluster.
D
dingbo 已提交
425

426
Finally, start `taos` and execute the `show dnodes` command. If you can see all the nodes that have joined the cluster, the cluster building process was successfully completed. For specific operation procedures and precautions, please refer to the document "[TDengine Cluster Installation and Management](/cluster/)".
D
dingbo 已提交
427

428
## Appendix 4: Super Table Names
D
dingbo 已提交
429

430
Since OpenTSDB's metric name has a dot (".") in it, for example, a metric with a name like "cpu.usage_user", the dot has a special meaning in TDengine and is a separator used to separate database and table names. TDengine also provides "escape" characters to allow users to use keywords or special separators (e.g., dots) in (super)table names. To use special characters, enclose the table name in escape characters, e.g.: `cpu.usage_user`. It is a valid (super) table name.
D
dingbo 已提交
431

432
## Appendix 5: Reference Articles
D
dingbo 已提交
433

434 435
1. [Using TDengine + collectd/StatsD + Grafana to quickly build an IT operation and maintenance monitoring system](/application/collectd/)
2. [Write collected data directly to TDengine through collectd](/third-party/collectd/)