# Data Model and Architecture ## Data Model ### A Typical IoT Scenario In a typical IoT scenario, there are many types of devices. Each device is collecting one or multiple metrics. For a specific type of device, the collected data looks like the table below: | Device ID | Time Stamp | Value 1 | Value 2 | Value 3 | Tag 1 | Tag 2 | | :-------: | :-----------: | :-----: | :-----: | :-----: | :---: | :---: | | D1001 | 1538548685000 | 10.3 | 219 | 0.31 | Red | Tesla | | D1002 | 1538548684000 | 10.2 | 220 | 0.23 | Blue | BMW | | D1003 | 1538548686500 | 11.5 | 221 | 0.35 | Black | Honda | | D1004 | 1538548685500 | 13.4 | 223 | 0.29 | Red | Volvo | | D1001 | 1538548695000 | 12.6 | 218 | 0.33 | Red | Tesla | | D1004 | 1538548696600 | 11.8 | 221 | 0.28 | Black | Honda | Each data record has device ID, timestamp, the collected metrics, and static tags associated with the device. Each device generates a data record in a pre-defined timer or triggered by an event. It is a sequence of data points, like a stream. ### Data Characteristics Being a series of data points over time, data points generated by devices, sensors, servers, or applications have strong common characteristics. 1. metric is always structured data; 2. there are rarely delete/update operations on collected data; 3. there is only one single data source for one device or sensor; 4. ratio of read/write is much lower than typical Internet application; 5. the user pays attention to the trend of data, not the specific value at a specific time; 6. there is always a data retention policy; 7. the data query is always executed in a given time range and a subset of devices; 8. real-time aggregation or analytics is mandatory; 9. traffic is predictable based on the number of devices and sampling frequency; 10. data volume is huge, a system may generate 10 billion data points in a day. By utilizing the above characteristics, TDengine designs the storage and computing engine in a special and optimized way for time-series data. The system efficiency is improved significantly. ### Relational Database Model Since time-series data is more likely to be structured data, TDengine adopts the traditional relational database model to process them. You need to create a database, create tables with schema definition, then insert data points and execute queries to explore the data. Standard SQL is used, there is no learning curve. ### One Table for One Device Due to different network latency, the data points from different devices may arrive at the server out of order. But for the same device, data points will arrive at the server in order if system is designed well. To utilize this special feature, TDengine requires the user to create a table for each device (time-stream). For example, if there are over 10,000 smart meters, 10,000 tables shall be created. For the table above, 4 tables shall be created for device D1001, D1002, D1003 and D1004, to store the data collected. This strong requirement can guarantee the data points from a device can be saved in a continuous memory/hard disk space block by block. If queries are applied only on one device in a time range, this design will reduce the read latency significantly since a whole block is owned by one single device. Also, write latency can be significantly reduced too, since the data points generated by the same device will arrive in order, the new data point will be simply appended to a block. Cache block size and the rows of records in a file block can be configured to fit the scenarios. ### Best Practices **Table**: TDengine suggests to use device ID as the table name (like D1001 in the above diagram). Each device may collect one or more metrics (like value1, valu2, valu3 in the diagram). Each metric has a column in the table, the metric name can be used as the column name. The data type for a column can be int, float, double, tinyint, bigint, bool or binary. Sometimes, a device may have multiple metric group, each group have different sampling period, you shall create a table for each group for each device. The first column in the table must be time stamp. TDengine uses time stamp as the index, and won’t build the index on any metrics stored. **Tags:** to support aggregation over multiple tables efficiently, [STable(Super Table)](../super-table) concept is introduced by TDengine. A STable is used to represent the same type of device. The schema is used to define the collected metrics(like value1, value2, value3 in the diagram), and tags are used to define the static attributes for each table or device(like tag1, tag2 in the diagram). A table is created via STable with a specific tag value. All or a subset of tables in a STable can be aggregated by filtering tag values. **Database:** different types of devices may generate data points in different patterns and shall be processed differently. For example, sampling frequency, data retention policy, replication number, cache size, record size, the compression algorithm may be different. To make the system more efficient, TDengine suggests creating a different database with unique configurations for different scenarios **Schemaless vs Schema:** compared with NoSQL database, since a table with schema definition shall be created before the data points can be inserted, flexibilities are not that good, especially when the schema is changed. But in most IoT scenarios, the schema is well defined and is rarely changed, the loss of flexibilities won’t be a big pain to developers or the administrator. TDengine allows the application to change the schema in a second even there is a huge amount of historical data when schema has to be changed. TDengine does not impose a limitation on the number of tables, [STables](../super-table), or databases. You can create any number of STable or databases to fit the scenarios. ## Architecture There are two main modules in TDengine server as shown in Picture 1: **Management Module (MGMT)** and **Data Module(DNODE)**. The whole TDengine architecture also includes a **TDengine Client Module**.