@@ -10,17 +10,17 @@ As a time series database for Cloud Native architecture design, TDengine support
...
@@ -10,17 +10,17 @@ As a time series database for Cloud Native architecture design, TDengine support
To meet [high availability ](https://docs.taosdata.com/tdinternal/high-availability/)requirements, clusters need to meet the following requirements:
To meet [high availability ](https://docs.taosdata.com/tdinternal/high-availability/)requirements, clusters need to meet the following requirements:
- 3 or more dnodes: The vnodes in the vgroup of TDengine are not allowed to be distributed in one dnode at the same time, so if you create a database with 3 copies, the number of dnodes is greater than or equal to 3
- 3 or more dnodes : multiple vnodes in the same vgroup of TDengine are not allowed to be distributed in one dnode at the same time, so if you create a database with 3 copies, the number of dnodes is greater than or equal to 3
- 3 mnodes: nmode is responsible for the management of the entire cluster. TDengine defaults to an mnode. At this time, if the dnode where the mnode is located is dropped, the entire cluster is unavailable at this time
- 3 mnodes : m n ode is responsible for the management of the entire cluster, TDengine defaults to an mnode . If the dnode where the mnode is located is dropped, the entire cluster is unavailable .
-There are 3 copies of the database, and the copy configuration of TDengine is DB level, which can be satisfied with 3 copies. In a 3-node cluster, any dnode goes offline, which does not affect the normal use of the cluster. **If the number of offline is 2, the cluster is unavailable at this time, and RAFT cannot complete the election** , (Enterprise Edition: In the disaster recovery scenario, any node data file is damaged, which can be restored by pulling up the dnode again)
-Database 3 replicas : TDengine replica configuration is the database level, so database 3 replicas can meet the three dnode cluster, any one dnode offline, does not affect the normal use of the cluster . **If the number of offline****dnodes****is 2, then the cluster is not available,****because****RAFT can not complete the election****.** (Enterprise version: in the disaster recovery scenario, any node data file is damaged, can be restored by pulling up the dnode again)
## Prerequisites
## Prerequisites
Before deploying TDengine on Kubernetes, perform the following:
Before deploying TDengine on Kubernetes, perform the following:
-Current steps are compatible with Kubernetes v1.5 and later version.
-This article applies Kubernetes 1.19 and above
-Install and configure minikube, kubectl, and helm.
-This article uses the kubectl tool to install and deploy, please install the corresponding software in advance
-Install and deploy Kubernetes and ensure that it can be accessed and used normally. Update any container registries or other services as necessary.
-Kubernetes have been installed and deployed and can access or update the necessary container repositories or other services
You can download the configuration files in this document from [GitHub](https://github.com/taosdata/TDengine-Operator/tree/3.0/src/tdengine).
You can download the configuration files in this document from [GitHub](https://github.com/taosdata/TDengine-Operator/tree/3.0/src/tdengine).
...
@@ -52,7 +52,7 @@ spec:
...
@@ -52,7 +52,7 @@ spec:
According to Kubernetes instructions for various deployments, we will use StatefulSet as the service type of TDengine. Create the file `tdengine.yaml `, where replicas defines the number of cluster nodes as 3. The node time zone is China (Asia/Shanghai), and each node is allocated 5G standard storage (refer to the [Storage Classes ](https://kubernetes.io/docs/concepts/storage/storage-classes/)configuration storage class). You can also modify accordingly according to the actual situation.
According to Kubernetes instructions for various deployments, we will use StatefulSet as the service type of TDengine. Create the file `tdengine.yaml `, where replicas defines the number of cluster nodes as 3. The node time zone is China (Asia/Shanghai), and each node is allocated 5G standard storage (refer to the [Storage Classes ](https://kubernetes.io/docs/concepts/storage/storage-classes/)configuration storage class). You can also modify accordingly according to the actual situation.
You need to pay attention to the configuration of startupProbe. After the dnode is disconnected for a period of time, restart, and the newly launched dnode will be temporarily unavailable. If the startupProbe configuration is too small, Kubernetes will think that the pod is in an abnormal state and will try to pull the pod again. At this time, dnode will restart frequently and never recover. Refer to [Configure Liveness, Readiness and Startup Probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
Please pay special attention to the startupProbe configuration, after dnode 's pod drops for a period of time, then restart, this time the newly launched dnode will be temporarily unavailable . If the startupProbe configuration is too small, Kubernetes will think that the Pod is in an abnormal state , and try to restart the Pod, the dnode 's Pod will restart frequently and never return to the normal state . Refer to [Configure Liveness, Readiness and Startup Probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
```YAML
```YAML
---
---
...
@@ -149,7 +149,7 @@ spec:
...
@@ -149,7 +149,7 @@ spec:
## Use kubectl to deploy TDengine
## Use kubectl to deploy TDengine
Execute the following commands in sequence, and you need to create the corresponding namespace in advance.
First create the corresponding namespace, and then execute the following command in sequence :
@@ -325,7 +325,7 @@ Query OK, 2 row(s) in set (0.001489s)
...
@@ -325,7 +325,7 @@ Query OK, 2 row(s) in set (0.001489s)
### Test fault tolerance
### Test fault tolerance
The dnode where the Mnode leader is located is offline, dnode1
The dnode where the mnode leader is located is disconnected, dnode1
```Bash
```Bash
kubectl get pod -l app=tdengine -n tdengine-test -o wide
kubectl get pod -l app=tdengine -n tdengine-test -o wide
...
@@ -389,7 +389,7 @@ taos> select *from test1.t1
...
@@ -389,7 +389,7 @@ taos> select *from test1.t1
Query OK, 4 row(s) in set (0.001994s)
Query OK, 4 row(s) in set (0.001994s)
```
```
In the same way, as for the mnode dropped by the non-leader, reading and writing can of course be performed normally, so there will be no too much display here.
Similarly, as for the non-leader mnode dropped, read and write can of course be normal, here will not do too much display .
## Scaling Out Your Cluster
## Scaling Out Your Cluster
...
@@ -425,7 +425,7 @@ The dnode list of the expanded four-node TDengine cluster:
...
@@ -425,7 +425,7 @@ The dnode list of the expanded four-node TDengine cluster:
```Plain
```Plain
taos> show dnodes
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
After the POD is deleted, the PVC needs to be deleted manually, otherwise the previous data will continue to be used in the next expansion, resulting in the inability to join the cluster normally.
After the POD is deleted, the PVC needs to be deleted manually, otherwise the previous data will continue to be used for the next expansion, resulting in the inability to join the cluster normally.
@@ -504,7 +504,7 @@ Query OK, 4 row(s) in set (0.003881s)
...
@@ -504,7 +504,7 @@ Query OK, 4 row(s) in set (0.003881s)
> **When deleting the pvc, you need to pay attention to the pv persistentVolumeReclaimPolicy policy. It is recommended to change to Delete, so that the pv will be automatically cleaned up when the pvc is deleted, and the underlying csi storage resources will be cleaned up at the same time. If the policy of deleting the pvc to automatically clean up the pv is not configured, and then after deleting the pvc, when manually cleaning up the pv, the csi storage resources corresponding to the pv may not be released.**
> **When deleting the pvc, you need to pay attention to the pv persistentVolumeReclaimPolicy policy. It is recommended to change to Delete, so that the pv will be automatically cleaned up when the pvc is deleted, and the underlying csi storage resources will be cleaned up at the same time. If the policy of deleting the pvc to automatically clean up the pv is not configured, and then after deleting the pvc, when manually cleaning up the pv, the csi storage resources corresponding to the pv may not be released.**
Complete removal of TDengine cluster, need to clean statefulset, svc, configmap, pvc respectively.
Complete removal of TDengine cluster, need to clean up statefulset, svc, configmap, pvc respectively.
@@ -534,10 +534,10 @@ Query OK, 4 row(s) in set (0.003862s)
...
@@ -534,10 +534,10 @@ Query OK, 4 row(s) in set (0.003862s)
## Finally
## Finally
For the high availability and high reliability of TDengine in the k8s environment, for hardware damage and disaster recovery, it is divided into two levels:
For the high availability and high reliability of TDengine in a Kubernetes environment, hardware damage and disaster recovery are divided into two levels:
1. The disaster recovery capability of the underlying distributed Block Storage, the multi-replica of Block Storage, the current popular distributed Block Storage such as ceph, has the multi-replica capability, extending the storage replica to different racks, cabinets, computer rooms, Data center (or directly use the Block Storage service provided by Public Cloud vendors)
1. The disaster recovery capability of the underlying distributed Block Storage, the multi-copy of Block Storage, the current popular distributed Block Storage such as CEPH , has the multi-copy capability, extending the storage copy to different racks, cabinets, computer rooms, Data center (or directly use the Block Storage service provided by Public Cloud vendors)
2. TDengine disaster recovery, in TDengine Enterprise, itself has when a dnode permanently offline (TCE-metal disk damage, data sorting loss), re-pull a blank dnode to restore the original dnode work.
2. TDengine disaster recovery, in TDengine Enterprise, itself has when a dnode permanently offline (TCE-metal disk damage, data sorting loss), re-pull a blank dnode to restore the original dnode work.
Finally, welcome to [TDengine Cloud ](https://cloud.tdengine.com/)to experience the one-stop fully managed TDengine Cloud as a Service.
Finally, welcome to [TDengine Cloud ](https://cloud.tdengine.com/)to experience the one-stop fully managed TDengine Cloud as a Service.