From f57b85226d221d7d9d7d15f104aa9c73ead4638c Mon Sep 17 00:00:00 2001 From: Chait Diwadkar <94201190+cdiwadkar16@users.noreply.github.com> Date: Sat, 28 May 2022 20:09:43 -0700 Subject: [PATCH] docs:cdiwadkar16-patch-4-32 - several Some additions, stylistic changes, grammar, spelling changes. --- docs-en/13-operation/02-planning.mdx | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs-en/13-operation/02-planning.mdx b/docs-en/13-operation/02-planning.mdx index 4b8ed1f1b8..c1baf92dbf 100644 --- a/docs-en/13-operation/02-planning.mdx +++ b/docs-en/13-operation/02-planning.mdx @@ -2,17 +2,17 @@ title: Resource Planning --- -The computing and storage resources need to be planned if using TDengine to build an IoT platform. How to plan the CPU, memory and disk required will be described in this chapter. +It is important to plan computing and storage resources if using TDengine to build an IoT, time-series or Big Data platform. How to plan the CPU, memory and disk resources required, will be described in this chapter. ## Memory Requirement of Server Side -The number of vgroups created for each database is the same as the number of CPU cores by default and can be configured by parameter `maxVgroupsPerDb`, each vnode in a vgroup stores one replica. Each vnode consumes a fixed size of memory, i.e. `blocks` \* `cache`. Besides, some memory is required for tag values associated with each table. A fixed amount of memory is required for each cluster. So, the memory required for each DB can be calculated using the formula below: +By default, the number of vgroups created for each database is the same as the number of CPU cores. This can be configured by the parameter `maxVgroupsPerDb`. Each vnode in a vgroup stores one replica. Each vnode consumes a fixed amount of memory, i.e. `blocks` \* `cache`. In addition, some memory is required for tag values associated with each table. A fixed amount of memory is required for each cluster. So, the memory required for each DB can be calculated using the formula below: ``` Database Memory Size = maxVgroupsPerDb * replica * (blocks * cache + 10MB) + numOfTables * (tagSizePerTable + 0.5KB) ``` -For example, assuming the default value of `maxVgroupPerDB` is 64, the default value of `cache` 16M, the default value of `blocks` is 6, there are 100,000 tables in a DB, the replica number is 1, total length of tag values is 256 bytes, the total memory required for this DB is: 64 \* 1 \* (16 \* 6 + 10) + 100000 \* (0.25 + 0.5) / 1000 = 6792M. +For example, assuming the default value of `maxVgroupPerDB` is 64, the default value of `cache` is 16M, the default value of `blocks` is 6, there are 100,000 tables in a DB, the replica number is 1, total length of tag values is 256 bytes, the total memory required for this DB is: 64 \* 1 \* (16 \* 6 + 10) + 100000 \* (0.25 + 0.5) / 1000 = 6792M. In the real operation of TDengine, we are more concerned about the memory used by each TDengine server process `taosd`. @@ -22,10 +22,10 @@ In the real operation of TDengine, we are more concerned about the memory used b In the above formula: -1. "vnode_memory" of a `taosd` process is the memory used by all vnodes hosted by this `taosd` process. It can be roughly calculated by firstly adding up the total memory of all DBs whose memory usage can be derived according to the formula mentioned previously then dividing by number of dnodes and multiplying the number of replicas. +1. "vnode_memory" of a `taosd` process is the memory used by all vnodes hosted by this `taosd` process. It can be roughly calculated by firstly adding up the total memory of all DBs whose memory usage can be derived according to the formula for Database Memory Size, mentioned above, then dividing by number of dnodes and multiplying the number of replicas. ``` - vnode_memory = sum(Database memory) / number_of_dnodes * replica + vnode_memory = (sum(Database Memory Size) / number_of_dnodes) * replica ``` 2. "mnode_memory" of a `taosd` process is the memory consumed by a mnode. If there is one (and only one) mnode hosted in a `taosd` process, the memory consumed by "mnode" is "0.2KB \* the total number of tables in the cluster". @@ -56,8 +56,8 @@ So, at least 3GB needs to be reserved for such a client. The CPU resources required depend on two aspects: -- **Data Insertion** Each dnode of TDengine can process at least 10,000 insertion requests in one second, while each insertion request can have multiple rows. The computing resource consumed between inserting 1 row one time and inserting 10 rows one time is very small. So, the more the rows to insert one time, the higher the efficiency. Inserting in bach also exposes requirements for the client side which needs to cache rows and insert in batch once the cached rows reaches a threshold. -- **Data Query** High efficiency query is provided in TDengine, but it's hard to estimate the CPU resource required because the queries used in different use cases and the frequency of queries vary significantly. It can only be verified with the query statements, query frequency, data size to be queried, etc provided by user. +- **Data Insertion** Each dnode of TDengine can process at least 10,000 insertion requests in one second, while each insertion request can have multiple rows. The difference in computing resource consumed, between inserting 1 row at a time, and inserting 10 rows at a time is very small. So, the more the number of rows that can be inserted one time, the higher the efficiency. Inserting in batch also imposes requirements on the client side which needs to cache rows to insert in batch once the number of cached rows reaches a threshold. +- **Data Query** High efficiency query is provided in TDengine, but it's hard to estimate the CPU resource required because the queries used in different use cases and the frequency of queries vary significantly. It can only be verified with the query statements, query frequency, data size to be queried, and other requirements provided by users. In short, the CPU resource required for data insertion can be estimated but it's hard to do so for query use cases. In real operation, it's suggested to control CPU usage below 50%. If this threshold is exceeded, it's a reminder for system operator to add more nodes in the cluster to expand resources. @@ -71,12 +71,12 @@ Raw DataSize = numOfTables * rowSizePerTable * rowsPerTable For example, there are 10,000,000 meters, while each meter collects data every 15 minutes and the data size of each collection is 128 bytes, so the raw data size of one year is: 10000000 \* 128 \* 24 \* 60 / 15 \* 365 = 44.8512(TB). Assuming compression ratio is 5, the actual disk size is: 44.851 / 5 = 8.97024(TB). -Parameter `keep` can be used to set how long the data will be kept on disk. To further reduce storage cost, multiple storage levels can be enabled in TDengine, with the coldest data stored on the cheapest storage device, and this is transparent to application programs. +Parameter `keep` can be used to set how long the data will be kept on disk. To further reduce storage cost, multiple storage levels can be enabled in TDengine, with the coldest data stored on the cheapest storage device. This is completely transparent to application programs. -To increase the performance, multiple disks can be setup for parallel data reading or data inserting. Please note that an expensive disk array is not necessary because replications are used in TDengine to provide high availability. +To increase performance, multiple disks can be setup for parallel data reading or data inserting. Please note that an expensive disk array is not necessary because replications are used in TDengine to provide high availability. ## Number of Hosts -A host can be either physical or virtual. The total memory, total CPU, total disk required can be estimated according to the formulas mentioned previously. Then, according to the system resources that a single host can provide, assuming all hosts have the same resources, the number of hosts can be derived easily. +A host can be either physical or virtual. The total memory, total CPU, total disk required can be estimated according to the formulae mentioned previously. Then, according to the system resources that a single host can provide, assuming all hosts have the same resources, the number of hosts can be derived easily. **Quick Estimation for CPU, Memory and Disk** Please refer to [Resource Estimate](https://www.taosdata.com/config/config.html). -- GitLab