未验证 提交 d7bb7e11 编写于 作者: Y Yaqiang Li 提交者: GitHub

Merge pull request #22164 from taosdata/docs/yuanliu/deploy-k8s-TD-25202

docs: optimization k8s deploy
......@@ -4,23 +4,31 @@ sidebar_label: Kubernetes
description: This document describes how to deploy TDengine on Kubernetes.
---
TDengine is a cloud-native time-series database that can be deployed on Kubernetes. This document gives a step-by-step description of how you can use YAML files to create a TDengine cluster and introduces common operations for TDengine in a Kubernetes environment.
## Overview
As a time series database for Cloud Native architecture design, TDengine supports Kubernetes deployment. Firstly we introduce how to use YAML files to create a highly available TDengine cluster from scratch step by step for production usage, and highlight the common operations of TDengine in Kubernetes environment.
To meet [high availability ](https://docs.taosdata.com/tdinternal/high-availability/)requirements, clusters need to meet the following requirements:
- 3 or more dnodes: multiple vnodes in the same vgroup of TDengine are not allowed to be distributed in one dnode at the same time, so if you create a database with 3 replicas, the number of dnodes is greater than or equal to 3
- 3 mnodes: mnode is responsible for the management of the entire TDengine cluster. The default number of mnode in TDengine cluster is only one. If the dnode where the mnode located is dropped, the entire cluster is unavailable.
- Database 3 replicas: The TDengine replica configuration is the database level, so 3 replicas for the database must need three dnodes in the cluster. If any one dnode is offline, does not affect the normal usage of the whole cluster. **If the number of offline** **dnodes** **is 2, then the cluster is not available,** **because** ** the cluster can not complete the election based on RAFT** **.** (Enterprise version: in the disaster recovery scenario, any node data file is damaged, can be restored by pulling up the dnode again)
## Prerequisites
Before deploying TDengine on Kubernetes, perform the following:
* Current steps are compatible with Kubernetes v1.5 and later version.
* Install and configure minikube, kubectl, and helm.
* Install and deploy Kubernetes and ensure that it can be accessed and used normally. Update any container registries or other services as necessary.
- This article applies Kubernetes 1.19 and above
- This article uses the **kubectl** tool to install and deploy, please install the corresponding software in advance
- Kubernetes have been installed and deployed and can access or update the necessary container repositories or other services
You can download the configuration files in this document from [GitHub](https://github.com/taosdata/TDengine-Operator/tree/3.0/src/tdengine).
## Configure the service
Create a service configuration file named `taosd-service.yaml`. Record the value of `metadata.name` (in this example, `taos`) for use in the next step. Add the ports required by TDengine:
Create a service configuration file named `taosd-service.yaml`. Record the value of `metadata.name` (in this example, `taos`) for use in the next step. And then add the ports required by TDengine and record the value of the selector label "app" (in this example, `tdengine`) for use in the next step:
```yaml
```YAML
---
apiVersion: v1
kind: Service
......@@ -31,10 +39,10 @@ metadata:
spec:
ports:
- name: tcp6030
- protocol: "TCP"
protocol: "TCP"
port: 6030
- name: tcp6041
- protocol: "TCP"
protocol: "TCP"
port: 6041
selector:
app: "tdengine"
......@@ -42,10 +50,11 @@ spec:
## Configure the service as StatefulSet
Configure the TDengine service as a StatefulSet.
Create the `tdengine.yaml` file and set `replicas` to 3. In this example, the region is set to Asia/Shanghai and 10 GB of standard storage are allocated per node. You can change the configuration based on your environment and business requirements.
According to Kubernetes instructions for various deployments, we will use StatefulSet as the deployment resource type of TDengine. Create the file `tdengine.yaml `, where replicas defines the number of cluster nodes as 3. The node time zone is China (Asia/Shanghai), and each node is allocated 5G standard storage (refer to the [Storage Classes ](https://kubernetes.io/docs/concepts/storage/storage-classes/)configuration storage class). You can also modify accordingly according to the actual situation.
```yaml
Please pay special attention to the startupProbe configuration. If dnode's Pod drops for a period of time and then restart, the newly launched dnode Pod will be temporarily unavailable. The reason is the startupProbe configuration is too small, Kubernetes will know that the Pod is in an abnormal state and try to restart it, then the dnode's Pod will restart frequently and never return to the normal status. Refer to [Configure Liveness, Readiness and Startup Probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
```YAML
---
apiVersion: apps/v1
kind: StatefulSet
......@@ -69,14 +78,14 @@ spec:
spec:
containers:
- name: "tdengine"
image: "tdengine/tdengine:3.0.0.0"
image: "tdengine/tdengine:3.0.7.1"
imagePullPolicy: "IfNotPresent"
ports:
- name: tcp6030
- protocol: "TCP"
protocol: "TCP"
containerPort: 6030
- name: tcp6041
- protocol: "TCP"
protocol: "TCP"
containerPort: 6041
env:
# POD_NAME for FQDN config
......@@ -102,12 +111,18 @@ spec:
# Must set if you want a cluster.
- name: TAOS_FIRST_EP
value: "$(STS_NAME)-0.$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local:$(TAOS_SERVER_PORT)"
# TAOS_FQDN should always be set in k8s env.
# TAOS_FQND should always be set in k8s env.
- name: TAOS_FQDN
value: "$(POD_NAME).$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local"
volumeMounts:
- name: taosdata
mountPath: /var/lib/taos
startupProbe:
exec:
command:
- taos-check
failureThreshold: 360
periodSeconds: 10
readinessProbe:
exec:
command:
......@@ -129,266 +144,401 @@ spec:
storageClassName: "standard"
resources:
requests:
storage: "10Gi"
storage: "5Gi"
```
## Use kubectl to deploy TDengine
Run the following commands:
First create the corresponding namespace, and then execute the following command in sequence :
```bash
kubectl apply -f taosd-service.yaml
kubectl apply -f tdengine.yaml
```Bash
kubectl apply -f taosd-service.yaml -n tdengine-test
kubectl apply -f tdengine.yaml -n tdengine-test
```
The preceding configuration generates a TDengine cluster with three nodes in which dnodes are automatically configured. You can run the `show dnodes` command to query the nodes in the cluster:
The above configuration will generate a three-node TDengine cluster, dnode is automatically configured, you can use the **show dnodes** command to view the nodes of the current cluster:
```bash
kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-1 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-2 -- taos -s "show dnodes"
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show dnodes"
kubectl exec -it tdengine-2 -n tdengine-test -- taos -s "show dnodes"
```
The output is as follows:
```
```Bash
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:14:57.285 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:11.302 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:23.290 | |
Query OK, 3 rows in database (0.003655s)
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 0 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-19 17:54:18.469 | | | |
2 | tdengine-1.ta... | 0 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-19 17:54:38.698 | | | |
3 | tdengine-2.ta... | 0 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-19 17:55:02.039 | | | |
Query OK, 3 row(s) in set (0.001853s)
```
View the current mnode
```Bash
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show mnodes\G"
taos> show mnodes\G
*************************** 1.row ***************************
id: 1
endpoint: tdengine-0.taosd.tdengine-test.svc.cluster.local:6030
role: leader
status: ready
create_time: 2023-07-19 17:54:18.559
reboot_time: 2023-07-19 17:54:19.520
Query OK, 1 row(s) in set (0.001282s)
```
## Create mnode
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "create mnode on dnode 2"
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "create mnode on dnode 3"
```
View mnode
```Bash
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show mnodes\G"
taos> show mnodes\G
*************************** 1.row ***************************
id: 1
endpoint: tdengine-0.taosd.tdengine-test.svc.cluster.local:6030
role: leader
status: ready
create_time: 2023-07-19 17:54:18.559
reboot_time: 2023-07-20 09:19:36.060
*************************** 2.row ***************************
id: 2
endpoint: tdengine-1.taosd.tdengine-test.svc.cluster.local:6030
role: follower
status: ready
create_time: 2023-07-20 09:22:05.600
reboot_time: 2023-07-20 09:22:12.838
*************************** 3.row ***************************
id: 3
endpoint: tdengine-2.taosd.tdengine-test.svc.cluster.local:6030
role: follower
status: ready
create_time: 2023-07-20 09:22:20.042
reboot_time: 2023-07-20 09:22:23.271
Query OK, 3 row(s) in set (0.003108s)
```
## Enable port forwarding
The kubectl port forwarding feature allows applications to access the TDengine cluster running on Kubernetes.
Kubectl port forwarding enables applications to access TDengine clusters running in Kubernetes environments.
```
kubectl port-forward tdengine-0 6041:6041 &
```bash
kubectl port-forward -n tdengine-test tdengine-0 6041:6041 &
```
Use curl to verify that the TDengine REST API is working on port 6041:
Use **curl** to verify that the TDengine REST API is working on port 6041:
```
$ curl -u root:taosdata -d "show databases" 127.0.0.1:6041/rest/sql
Handling connection for 6041
{"code":0,"column_meta":[["name","VARCHAR",64],["create_time","TIMESTAMP",8],["vgroups","SMALLINT",2],["ntables","BIGINT",8],["replica","TINYINT",1],["strict","VARCHAR",4],["duration","VARCHAR",10],["keep","VARCHAR",32],["buffer","INT",4],["pagesize","INT",4],["pages","INT",4],["minrows","INT",4],["maxrows","INT",4],["comp","TINYINT",1],["precision","VARCHAR",2],["status","VARCHAR",10],["retention","VARCHAR",60],["single_stable","BOOL",1],["cachemodel","VARCHAR",11],["cachesize","INT",4],["wal_level","TINYINT",1],["wal_fsync_period","INT",4],["wal_retention_period","INT",4],["wal_retention_size","BIGINT",8],["wal_roll_period","INT",4],["wal_segment_size","BIGINT",8]],"data":[["information_schema",null,null,16,null,null,null,null,null,null,null,null,null,null,null,"ready",null,null,null,null,null,null,null,null,null,null],["performance_schema",null,null,10,null,null,null,null,null,null,null,null,null,null,null,"ready",null,null,null,null,null,null,null,null,null,null]],"rows":2}
```bash
curl -u root:taosdata -d "show databases" 127.0.0.1:6041/rest/sql
{"code":0,"column_meta":[["name","VARCHAR",64]],"data":[["information_schema"],["performance_schema"],["test"],["test1"]],"rows":4}
```
## Enable the dashboard for visualization
## Test cluster
The minikube dashboard command enables visualized cluster management.
### Data preparation
```
$ minikube dashboard
* Verifying dashboard health ...
* Launching proxy ...
* Verifying proxy health ...
* Opening http://127.0.0.1:46617/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/ in your default browser...
http://127.0.0.1:46617/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/
```
#### taosBenchmark
In some public clouds, minikube cannot be remotely accessed if it is bound to 127.0.0.1. In this case, use the kubectl proxy command to map the port to 0.0.0.0. Then, you can access the dashboard by using a web browser to open the dashboard URL above on the public IP address and port of the virtual machine.
Create a 3 replicas database with taosBenchmark, write 100 million data at the same time, and view the data at the same time
```
$ kubectl proxy --accept-hosts='^.*$' --address='0.0.0.0'
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taosBenchmark -I stmt -d test -n 10000 -t 10000 -a 3
# query data
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "select count(*) from test.meters;"
taos> select count(*) from test.meters;
count(*) |
========================
100000000 |
Query OK, 1 row(s) in set (0.103537s)
```
## Scaling Out Your Cluster
View vnode distribution by showing dnodes
TDengine clusters can scale automatically:
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
```bash
kubectl scale statefulsets tdengine --replicas=4
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 8 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-19 17:54:18.469 | | | |
2 | tdengine-1.ta... | 8 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-19 17:54:38.698 | | | |
3 | tdengine-2.ta... | 8 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-19 17:55:02.039 | | | |
Query OK, 3 row(s) in set (0.001357s)
```
The preceding command increases the number of replicas to 4. After running this command, query the pod status:
View xnode distribution by showing vgroup
```bash
kubectl get pods -l app=tdengine
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show test.vgroups"
taos> show test.vgroups
vgroup_id | db_name | tables | v1_dnode | v1_status | v2_dnode | v2_status | v3_dnode | v3_status | v4_dnode | v4_status | cacheload | cacheelements | tsma |
==============================================================================================================================================================================================
2 | test | 1267 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
3 | test | 1215 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
4 | test | 1215 | 1 | leader | 2 | follower | 3 | follower | NULL | NULL | 0 | 0 | 0 |
5 | test | 1307 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
6 | test | 1245 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
7 | test | 1275 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
8 | test | 1231 | 1 | leader | 2 | follower | 3 | follower | NULL | NULL | 0 | 0 | 0 |
9 | test | 1245 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
Query OK, 8 row(s) in set (0.001488s)
```
The output is as follows:
#### Manually created
```
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 161m
tdengine-1 1/1 Running 0 161m
tdengine-2 1/1 Running 0 32m
tdengine-3 1/1 Running 0 32m
Common a three-copy test1, and create a table, write 2 pieces of data
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- \
taos -s \
"create database if not exists test1 replica 3;
use test1;
create table if not exists t1(ts timestamp, n int);
insert into t1 values(now, 1)(now+1s, 2);"
```
The status of all pods is Running. Once the pod status changes to Ready, you can check the dnode status:
View xnode distribution by showing test1.vgroup
```bash
kubectl exec -i -t tdengine-3 -- taos -s "show dnodes"
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show test1.vgroups"
taos> show test1.vgroups
vgroup_id | db_name | tables | v1_dnode | v1_status | v2_dnode | v2_status | v3_dnode | v3_status | v4_dnode | v4_status | cacheload | cacheelements | tsma |
==============================================================================================================================================================================================
10 | test1 | 1 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
11 | test1 | 0 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
Query OK, 2 row(s) in set (0.001489s)
```
The following output shows that the TDengine cluster has been expanded to 4 replicas:
### Test fault tolerance
The dnode where the mnode leader is located is disconnected, dnode1
```Bash
kubectl get pod -l app=tdengine -n tdengine-test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 0/1 ErrImagePull 2 (2s ago) 20m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (6m48s ago) 20m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 21m 10.244.1.223 node85 <none> <none>
```
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:14:57.285 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:11.302 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:23.290 | |
4 | tdengine-3.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:33:16.039 | |
Query OK, 4 rows in database (0.008377s)
At this time, the cluster mnode has a re-election, and the monde on dnode1 becomes the leader.
```Bash
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show mnodes\G"
Welcome to the TDengine Command Line Interface, Client Version:3.0.7.1.202307190706
Copyright (c) 2022 by TDengine, all rights reserved.
taos> show mnodes\G
*************************** 1.row ***************************
id: 1
endpoint: tdengine-0.taosd.tdengine-test.svc.cluster.local:6030
role: offline
status: offline
create_time: 2023-07-19 17:54:18.559
reboot_time: 1970-01-01 08:00:00.000
*************************** 2.row ***************************
id: 2
endpoint: tdengine-1.taosd.tdengine-test.svc.cluster.local:6030
role: leader
status: ready
create_time: 2023-07-20 09:22:05.600
reboot_time: 2023-07-20 09:32:00.227
*************************** 3.row ***************************
id: 3
endpoint: tdengine-2.taosd.tdengine-test.svc.cluster.local:6030
role: follower
status: ready
create_time: 2023-07-20 09:22:20.042
reboot_time: 2023-07-20 09:32:00.026
Query OK, 3 row(s) in set (0.001513s)
```
## Scaling In Your Cluster
Cluster can read and write normally
When you scale in a TDengine cluster, your data is migrated to different nodes. You must run the drop dnodes command in TDengine to remove dnodes before scaling in your Kubernetes environment.
```Bash
# insert
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "insert into test1.t1 values(now, 1)(now+1s, 2);"
Note: In a Kubernetes StatefulSet service, the newest pods are always removed first. For this reason, when you scale in your TDengine cluster, ensure that you drop the newest dnodes.
taos> insert into test1.t1 values(now, 1)(now+1s, 2);
Insert OK, 2 row(s) affected (0.002098s)
```
$ kubectl exec -i -t tdengine-0 -- taos -s "drop dnode 4"
# select
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "select *from test1.t1"
taos> select *from test1.t1
ts | n |
========================================
2023-07-19 18:04:58.104 | 1 |
2023-07-19 18:04:59.104 | 2 |
2023-07-19 18:06:00.303 | 1 |
2023-07-19 18:06:01.303 | 2 |
Query OK, 4 row(s) in set (0.001994s)
```
```bash
$ kubectl exec -it tdengine-0 -- taos -s "show dnodes"
Similarly, as for the non-leader mnode dropped, read and write can of course be normal, here will not do too much display .
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:14:57.285 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:11.302 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:23.290 | |
Query OK, 3 rows in database (0.004861s)
```
## Scaling Out Your Cluster
Verify that the dnode have been successfully removed by running the `kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"` command. Then run the following command to remove the pod:
TDengine cluster supports automatic expansion:
```
kubectl scale statefulsets tdengine --replicas=3
```Bash
kubectl scale statefulsets tdengine --replicas=4
```
The newest pod in the deployment is removed. Run the `kubectl get pods -l app=tdengine` command to query the pod status:
The parameter `--replica = 4 `in the above command line indicates that you want to expand the TDengine cluster to 4 nodes. After execution, first check the status of the Pod:
```
$ kubectl get pods -l app=tdengine
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 4m7s
tdengine-1 1/1 Running 0 3m55s
tdengine-2 1/1 Running 0 2m28s
```Bash
kubectl get pod -l app=tdengine -n tdengine-test -o wide
```
After the pod has been removed, manually delete the PersistentVolumeClaim (PVC). Otherwise, future scale-outs will attempt to use existing data.
The output is as follows:
```bash
$ kubectl delete pvc taosdata-tdengine-3
```Plain
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 1/1 Running 4 (6h26m ago) 6h53m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (6h39m ago) 6h53m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 5h16m 10.244.1.224 node85 <none> <none>
tdengine-3 1/1 Running 0 3m24s 10.244.2.76 node86 <none> <none>
```
Your cluster has now been safely scaled in, and you can scale it out again as necessary.
At this time, the state of the POD is still Running, and the dnode state in the TDengine cluster can only be seen after the Pod status is `ready `:
```bash
$ kubectl scale statefulsets tdengine --replicas=4
statefulset.apps/tdengine scaled
it@k8s-2:~/TDengine-Operator/src/tdengine$ kubectl get pods -l app=tdengine
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 35m
tdengine-1 1/1 Running 0 34m
tdengine-2 1/1 Running 0 12m
tdengine-3 0/1 ContainerCreating 0 4s
it@k8s-2:~/TDengine-Operator/src/tdengine$ kubectl get pods -l app=tdengine
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 35m
tdengine-1 1/1 Running 0 34m
tdengine-2 1/1 Running 0 12m
tdengine-3 0/1 Running 0 7s
it@k8s-2:~/TDengine-Operator/src/tdengine$ kubectl exec -it tdengine-0 -- taos -s "show dnodes"
```Bash
kubectl exec -it tdengine-3 -n tdengine-test -- taos -s "show dnodes"
```
The dnode list of the expanded four-node TDengine cluster:
```Plain
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 17:38:49.012 | |
2 | tdengine-1.taosd.default.sv... | 1 | 4 | ready | 2022-07-25 17:39:01.517 | |
5 | tdengine-2.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 18:01:36.479 | |
6 | tdengine-3.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 18:13:54.411 | |
Query OK, 4 row(s) in set (0.001348s)
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
4 | tdengine-3.ta... | 0 | 16 | ready | 2023-07-20 16:01:44.007 | 2023-07-20 16:01:44.889 | | | |
Query OK, 4 row(s) in set (0.003628s)
```
## Remove a TDengine Cluster
## Scaling In Your Cluster
To fully remove a TDengine cluster, you must delete its statefulset, svc, configmap, and pvc entries:
Since the TDengine cluster will migrate data between nodes during volume expansion and contraction, using the **kubectl** command to reduce the volume requires first using the "drop dnodes" command ( **If there are 3 replicas of db in the cluster, the number of dnodes after reduction must also be greater than or equal to 3, otherwise the drop dnode operation will be aborted** ), the node deletion is completed before Kubernetes cluster reduction.
```bash
kubectl delete statefulset -l app=tdengine
kubectl delete svc -l app=tdengine
kubectl delete pvc -l app=tdengine
kubectl delete configmap taoscfg
Note: Since Kubernetes Pods in the Statefulset can only be removed in reverse order of creation, the TDengine drop dnode also needs to be removed in reverse order of creation, otherwise the Pod will be in an error state.
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "drop dnode 4"
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
Query OK, 3 row(s) in set (0.003324s)
```
## Troubleshooting
After confirming that the removal is successful (use kubectl exec -i -t tdengine-0 --taos -s "show dnodes" to view and confirm the dnode list), use the kubectl command to remove the Pod:
### Error 1
```Plain
kubectl scale statefulsets tdengine --replicas=3 -n tdengine-test
```
If you remove a pod without first running `drop dnode`, some TDengine nodes will go offline.
The last Pod will be deleted. Use the command kubectl get pods -l app = tdengine to check the Pod status:
```Plain
kubectl get pod -l app=tdengine -n tdengine-test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 1/1 Running 4 (6h55m ago) 7h22m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (7h9m ago) 7h23m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 5h45m 10.244.1.224 node85 <none> <none>
```
$ kubectl exec -it tdengine-0 -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 17:38:49.012 | |
2 | tdengine-1.taosd.default.sv... | 1 | 4 | ready | 2022-07-25 17:39:01.517 | |
5 | tdengine-2.taosd.default.sv... | 0 | 4 | offline | 2022-07-25 18:01:36.479 | status msg timeout |
6 | tdengine-3.taosd.default.sv... | 0 | 4 | offline | 2022-07-25 18:13:54.411 | status msg timeout |
Query OK, 4 row(s) in set (0.001323s)
```
After the Pod is deleted, the PVC needs to be deleted manually, otherwise the previous data will continue to be used for the next expansion, resulting in the inability to join the cluster normally.
### Error 2
```Bash
kubectl delete pvc aosdata-tdengine-3 -n tdengine-test
```
If the number of nodes after a scale-in is less than the value of the replica parameter, the cluster will go down:
The cluster state at this time is safe and can be scaled up again if needed.
Create a database with replica set to 2 and add data.
```Bash
kubectl scale statefulsets tdengine --replicas=4 -n tdengine-test
statefulset.apps/tdengine scaled
```bash
kubectl exec -i -t tdengine-0 -- \
taos -s \
"create database if not exists test replica 2;
use test;
create table if not exists t1(ts timestamp, n int);
insert into t1 values(now, 1)(now+1s, 2);"
kubectl get pod -l app=tdengine -n tdengine-test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 1/1 Running 4 (6h59m ago) 7h27m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (7h13m ago) 7h27m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 5h49m 10.244.1.224 node85 <none> <none>
tdengine-3 1/1 Running 0 20s 10.244.2.77 node86 <none> <none>
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
5 | tdengine-3.ta... | 0 | 16 | ready | 2023-07-20 16:31:34.092 | 2023-07-20 16:38:17.419 | | | |
Query OK, 4 row(s) in set (0.003881s)
```
Scale in to one node:
## Remove a TDengine Cluster
```bash
kubectl scale statefulsets tdengine --replicas=1
> **When deleting the PVC, you need to pay attention to the pv persistentVolumeReclaimPolicy policy. It is recommended to change to Delete, so that the PV will be automatically cleaned up when the PVC is deleted, and the underlying CSI storage resources will be cleaned up at the same time. If the policy of deleting the PVC to automatically clean up the PV is not configured, and then after deleting the pvc, when manually cleaning up the PV, the CSI storage resources corresponding to the PV may not be released.**
Complete removal of TDengine cluster, need to clean up statefulset, svc, configmap, pvc respectively.
```Bash
kubectl delete statefulset -l app=tdengine -n tdengine-test
kubectl delete svc -l app=tdengine -n tdengine-test
kubectl delete pvc -l app=tdengine -n tdengine-test
kubectl delete configmap taoscfg -n tdengine-test
```
In the TDengine CLI, you can see that no database operations succeed:
## Troubleshooting
### Error 1
No "drop dnode" is directly reduced. Since the TDengine has not deleted the node, the reduced pod causes some nodes in the TDengine cluster to be offline.
```Plain
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
5 | tdengine-3.ta... | 0 | 16 | offline | 2023-07-20 16:31:34.092 | 2023-07-20 16:38:17.419 | status msg timeout | | |
Query OK, 4 row(s) in set (0.003862s)
```
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | |
2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout |
Query OK, 2 row(s) in set (0.000845s)
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | |
2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout |
Query OK, 2 row(s) in set (0.000837s)
## Finally
taos> use test;
Database changed.
For the high availability and high reliability of TDengine in a Kubernetes environment, hardware damage and disaster recovery are divided into two levels:
taos> insert into t1 values(now, 3);
1. The disaster recovery capability of the underlying distributed Block Storage, the multi-copy of Block Storage, the current popular distributed Block Storage such as Ceph, has the multi-copy capability, extending the storage copy to different racks, cabinets, computer rooms, Data center (or directly use the Block Storage service provided by Public Cloud vendors)
2. TDengine disaster recovery, in TDengine Enterprise, itself has when a dnode permanently offline (TCE-metal disk damage, data sorting loss), re-pull a blank dnode to restore the original dnode work.
DB error: Unable to resolve FQDN (0.013874s)
Finally, welcome to [TDengine Cloud ](https://cloud.tdengine.com/)to experience the one-stop fully managed TDengine Cloud as a Service.
```
> TDengine Cloud is a minimalist fully managed time series data processing Cloud as a Service platform developed based on the open source time series database TDengine. In addition to high-performance time series database, it also has system functions such as caching, subscription and stream computing, and provides convenient and secure data sharing, as well as numerous enterprise-level functions. It allows enterprises in the fields of Internet of Things, Industrial Internet, Finance, IT operation and maintenance monitoring to significantly reduce labor costs and operating costs in the management of time series data.
......@@ -4,23 +4,31 @@ title: 在 Kubernetes 上部署 TDengine 集群
description: 利用 Kubernetes 部署 TDengine 集群的详细指南
---
作为面向云原生架构设计的时序数据库,TDengine 支持 Kubernetes 部署。这里介绍如何使用 YAML 文件一步一步从头创建一个 TDengine 集群,并重点介绍 Kubernetes 环境下 TDengine 的常用操作。
## 概述
作为面向云原生架构设计的时序数据库,TDengine 本身就支持 Kubernetes 部署。这里介绍如何使用 YAML 文件从头一步一步创建一个可用于生产使用的高可用 TDengine 集群,并重点介绍 Kubernetes 环境下 TDengine 的常用操作。
为了满足[高可用](https://docs.taosdata.com/tdinternal/high-availability/)的需求,集群需要满足如下要求:
- 3个及以上 dnode :TDengine 的同一个 vgroup 中的多个 vnode ,不允许同时分布在一个 dnode ,所以如果创建3副本的数据库,则 dnode 数大于等于3
- 3个 mnode :mnode 负责整个集群的管理工作,TDengine 默认是一个 mnode。如果这个 mnode 所在的 dnode 掉线,则整个集群不可用。
- 数据库的3副本:TDengine 的副本配置是数据库级别,所以数据库3副本可满足在3个 dnode 的集群中,任意一个 dnode 下线,都不影响集群的正常使用。**如果下线** **dnode** **个数为2时,此时集群不可用,****因为****RAFT无法完成选举****。**(企业版:在灾难恢复场景,任一节点数据文件损坏,都可以通过重新拉起dnode进行恢复)
## 前置条件
要使用 Kubernetes 部署管理 TDengine 集群,需要做好如下准备工作。
* 本文适用 Kubernetes v1.5 以上版本
* 本文和下一章使用 minikube、kubectl 和 helm 等工具进行安装部署,请提前安装好相应软件
* Kubernetes 已经安装部署并能正常访问使用或更新必要的容器仓库或其他服务
- 本文适用 Kubernetes v1.19 以上版本
- 本文使用 kubectl 工具进行安装部署,请提前安装好相应软件
- Kubernetes 已经安装部署并能正常访问使用或更新必要的容器仓库或其他服务
以下配置文件也可以从 [GitHub 仓库](https://github.com/taosdata/TDengine-Operator/tree/3.0/src/tdengine) 下载。
## 配置 Service 服务
创建一个 Service 配置文件:`taosd-service.yaml`,服务名称 `metadata.name` (此处为 "taosd") 将在下一步中使用到。添加 TDengine 所用到的端口:
创建一个 Service 配置文件:`taosd-service.yaml`,服务名称 `metadata.name` (此处为 "taosd") 将在下一步中使用到。首先添加 TDengine 所用到的端口,然后在选择器设置确定的标签 app (此处为 “tdengine”)。
```yaml
```YAML
---
apiVersion: v1
kind: Service
......@@ -42,10 +50,11 @@ spec:
## 有状态服务 StatefulSet
根据 Kubernetes 对各类部署的说明,我们将使用 StatefulSet 作为 TDengine 的服务类型。
创建文件 `tdengine.yaml`,其中 replicas 定义集群节点的数量为 3。节点时区为中国(Asia/Shanghai),每个节点分配 10G 标准(standard)存储。你也可以根据实际情况进行相应修改。
根据 Kubernetes 对各类部署的说明,我们将使用 StatefulSet 作为 TDengine 的部署资源类型。 创建文件 `tdengine.yaml`,其中 replicas 定义集群节点的数量为 3。节点时区为中国(Asia/Shanghai),每个节点分配 5G 标准(standard)存储(参考[Storage Classes](https://kubernetes.io/docs/concepts/storage/storage-classes/) 配置 storage class )。你也可以根据实际情况进行相应修改。
```yaml
请特别注意startupProbe的配置,在 dnode 的 Pod 掉线一段时间后,再重新启动,这个时候新上线的 dnode 会短暂不可用。如果startupProbe配置过小,Kubernetes 会认为该 Pod 处于不正常的状态,并尝试重启该 Pod,该 dnode 的 Pod 会频繁重启,始终无法恢复到正常状态。参考 [Configure Liveness, Readiness and Startup Probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
```YAML
---
apiVersion: apps/v1
kind: StatefulSet
......@@ -69,7 +78,7 @@ spec:
spec:
containers:
- name: "tdengine"
image: "tdengine/tdengine:3.0.0.0"
image: "tdengine/tdengine:3.0.7.1"
imagePullPolicy: "IfNotPresent"
ports:
- name: tcp6030
......@@ -108,6 +117,12 @@ spec:
volumeMounts:
- name: taosdata
mountPath: /var/lib/taos
startupProbe:
exec:
command:
- taos-check
failureThreshold: 360
periodSeconds: 10
readinessProbe:
exec:
command:
......@@ -129,199 +144,373 @@ spec:
storageClassName: "standard"
resources:
requests:
storage: "10Gi"
storage: "5Gi"
```
## 使用 kubectl 命令部署 TDengine 集群
顺序执行以下命令。
首先创建对应的 namespace,然后顺序执行以下命令:
```bash
kubectl apply -f taosd-service.yaml
kubectl apply -f tdengine.yaml
```Bash
kubectl apply -f taosd-service.yaml -n tdengine-test
kubectl apply -f tdengine.yaml -n tdengine-test
```
上面的配置将生成一个三节点的 TDengine 集群,dnode 为自动配置,可以使用 show dnodes 命令查看当前集群的节点:
```bash
kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-1 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-2 -- taos -s "show dnodes"
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show dnodes"
kubectl exec -it tdengine-2 -n tdengine-test -- taos -s "show dnodes"
```
输出如下:
```
```Bash
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:14:57.285 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:11.302 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:23.290 | |
Query OK, 3 rows in database (0.003655s)
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 0 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-19 17:54:18.469 | | | |
2 | tdengine-1.ta... | 0 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-19 17:54:38.698 | | | |
3 | tdengine-2.ta... | 0 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-19 17:55:02.039 | | | |
Query OK, 3 row(s) in set (0.001853s)
```
查看当前mnode
```Bash
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show mnodes\G"
taos> show mnodes\G
*************************** 1.row ***************************
id: 1
endpoint: tdengine-0.taosd.tdengine-test.svc.cluster.local:6030
role: leader
status: ready
create_time: 2023-07-19 17:54:18.559
reboot_time: 2023-07-19 17:54:19.520
Query OK, 1 row(s) in set (0.001282s)
```
## 创建mnode
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "create mnode on dnode 2"
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "create mnode on dnode 3"
```
查看mnode
```Bash
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show mnodes\G"
taos> show mnodes\G
*************************** 1.row ***************************
id: 1
endpoint: tdengine-0.taosd.tdengine-test.svc.cluster.local:6030
role: leader
status: ready
create_time: 2023-07-19 17:54:18.559
reboot_time: 2023-07-20 09:19:36.060
*************************** 2.row ***************************
id: 2
endpoint: tdengine-1.taosd.tdengine-test.svc.cluster.local:6030
role: follower
status: ready
create_time: 2023-07-20 09:22:05.600
reboot_time: 2023-07-20 09:22:12.838
*************************** 3.row ***************************
id: 3
endpoint: tdengine-2.taosd.tdengine-test.svc.cluster.local:6030
role: follower
status: ready
create_time: 2023-07-20 09:22:20.042
reboot_time: 2023-07-20 09:22:23.271
Query OK, 3 row(s) in set (0.003108s)
```
## 使能端口转发
利用 kubectl 端口转发功能可以使应用可以访问 Kubernetes 环境运行的 TDengine 集群。
```
kubectl port-forward tdengine-0 6041:6041 &
```Plain
kubectl port-forward -n tdengine-test tdengine-0 6041:6041 &
```
使用 curl 命令验证 TDengine REST API 使用的 6041 接口。
```Plain
curl -u root:taosdata -d "show databases" 127.0.0.1:6041/rest/sql
{"code":0,"column_meta":[["name","VARCHAR",64]],"data":[["information_schema"],["performance_schema"],["test"],["test1"]],"rows":4}
```
## 集群测试
### 数据准备
#### taosBenchmark
通过taosBenchmark 创建一个3副本的数据库,同时写入1亿条数据,同时查看数据
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taosBenchmark -I stmt -d test -n 10000 -t 10000 -a 3
# query data
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "select count(*) from test.meters;"
taos> select count(*) from test.meters;
count(*) |
========================
100000000 |
Query OK, 1 row(s) in set (0.103537s)
```
查看vnode分布,通过show dnodes
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 8 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-19 17:54:18.469 | | | |
2 | tdengine-1.ta... | 8 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-19 17:54:38.698 | | | |
3 | tdengine-2.ta... | 8 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-19 17:55:02.039 | | | |
Query OK, 3 row(s) in set (0.001357s)
```
$ curl -u root:taosdata -d "show databases" 127.0.0.1:6041/rest/sql
Handling connection for 6041
{"code":0,"column_meta":[["name","VARCHAR",64],["create_time","TIMESTAMP",8],["vgroups","SMALLINT",2],["ntables","BIGINT",8],["replica","TINYINT",1],["strict","VARCHAR",4],["duration","VARCHAR",10],["keep","VARCHAR",32],["buffer","INT",4],["pagesize","INT",4],["pages","INT",4],["minrows","INT",4],["maxrows","INT",4],["comp","TINYINT",1],["precision","VARCHAR",2],["status","VARCHAR",10],["retention","VARCHAR",60],["single_stable","BOOL",1],["cachemodel","VARCHAR",11],["cachesize","INT",4],["wal_level","TINYINT",1],["wal_fsync_period","INT",4],["wal_retention_period","INT",4],["wal_retention_size","BIGINT",8],["wal_roll_period","INT",4],["wal_segment_size","BIGINT",8]],"data":[["information_schema",null,null,16,null,null,null,null,null,null,null,null,null,null,null,"ready",null,null,null,null,null,null,null,null,null,null],["performance_schema",null,null,10,null,null,null,null,null,null,null,null,null,null,null,"ready",null,null,null,null,null,null,null,null,null,null]],"rows":2}
通过show vgroup 查看 vnode 分布情况
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show test.vgroups"
taos> show test.vgroups
vgroup_id | db_name | tables | v1_dnode | v1_status | v2_dnode | v2_status | v3_dnode | v3_status | v4_dnode | v4_status | cacheload | cacheelements | tsma |
==============================================================================================================================================================================================
2 | test | 1267 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
3 | test | 1215 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
4 | test | 1215 | 1 | leader | 2 | follower | 3 | follower | NULL | NULL | 0 | 0 | 0 |
5 | test | 1307 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
6 | test | 1245 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
7 | test | 1275 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
8 | test | 1231 | 1 | leader | 2 | follower | 3 | follower | NULL | NULL | 0 | 0 | 0 |
9 | test | 1245 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
Query OK, 8 row(s) in set (0.001488s)
```
#### 手工创建
常见一个三副本的test1,并创建一张表,写入2条数据
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- \
taos -s \
"create database if not exists test1 replica 3;
use test1;
create table if not exists t1(ts timestamp, n int);
insert into t1 values(now, 1)(now+1s, 2);"
```
## 使用 dashboard 进行图形化管理
通过show test1.vgroup 查看xnode分布情况
minikube 提供 dashboard 命令支持图形化管理界面。
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show test1.vgroups"
taos> show test1.vgroups
vgroup_id | db_name | tables | v1_dnode | v1_status | v2_dnode | v2_status | v3_dnode | v3_status | v4_dnode | v4_status | cacheload | cacheelements | tsma |
==============================================================================================================================================================================================
10 | test1 | 1 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
11 | test1 | 0 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
Query OK, 2 row(s) in set (0.001489s)
```
$ minikube dashboard
* Verifying dashboard health ...
* Launching proxy ...
* Verifying proxy health ...
* Opening http://127.0.0.1:46617/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/ in your default browser...
http://127.0.0.1:46617/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/
### 容错测试
Mnode leader 所在的 dnode 掉线,dnode1
```Bash
kubectl get pod -l app=tdengine -n tdengine-test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 0/1 ErrImagePull 2 (2s ago) 20m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (6m48s ago) 20m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 21m 10.244.1.223 node85 <none> <none>
```
对于某些公有云环境,minikube 绑定在 127.0.0.1 IP 地址上无法通过远程访问,需要使用 kubectl proxy 命令将端口映射到 0.0.0.0 IP 地址上,再通过浏览器访问虚拟机公网 IP 和端口以及相同的 dashboard URL 路径即可远程访问 dashboard。
此时集群mnode发生重新选举,dnode1上的monde 成为leader
```Bash
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show mnodes\G"
Welcome to the TDengine Command Line Interface, Client Version:3.0.7.1.202307190706
Copyright (c) 2022 by TDengine, all rights reserved.
taos> show mnodes\G
*************************** 1.row ***************************
id: 1
endpoint: tdengine-0.taosd.tdengine-test.svc.cluster.local:6030
role: offline
status: offline
create_time: 2023-07-19 17:54:18.559
reboot_time: 1970-01-01 08:00:00.000
*************************** 2.row ***************************
id: 2
endpoint: tdengine-1.taosd.tdengine-test.svc.cluster.local:6030
role: leader
status: ready
create_time: 2023-07-20 09:22:05.600
reboot_time: 2023-07-20 09:32:00.227
*************************** 3.row ***************************
id: 3
endpoint: tdengine-2.taosd.tdengine-test.svc.cluster.local:6030
role: follower
status: ready
create_time: 2023-07-20 09:22:20.042
reboot_time: 2023-07-20 09:32:00.026
Query OK, 3 row(s) in set (0.001513s)
```
$ kubectl proxy --accept-hosts='^.*$' --address='0.0.0.0'
集群可以正常读写
```Bash
# insert
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "insert into test1.t1 values(now, 1)(now+1s, 2);"
taos> insert into test1.t1 values(now, 1)(now+1s, 2);
Insert OK, 2 row(s) affected (0.002098s)
# select
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "select *from test1.t1"
taos> select *from test1.t1
ts | n |
========================================
2023-07-19 18:04:58.104 | 1 |
2023-07-19 18:04:59.104 | 2 |
2023-07-19 18:06:00.303 | 1 |
2023-07-19 18:06:01.303 | 2 |
Query OK, 4 row(s) in set (0.001994s)
```
同理,至于非leader得mnode掉线,读写当然可以正常进行,这里就不做过多的展示。
## 集群扩容
TDengine 集群支持自动扩容:
```bash
```Bash
kubectl scale statefulsets tdengine --replicas=4
```
上面命令行中参数 `--replica=4` 表示要将 TDengine 集群扩容到 4 个节点,执行后首先检查 POD 的状态:
```bash
kubectl get pods -l app=tdengine
```Bash
kubectl get pod -l app=tdengine -n tdengine-test -o wide
```
输出如下:
```
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 161m
tdengine-1 1/1 Running 0 161m
tdengine-2 1/1 Running 0 32m
tdengine-3 1/1 Running 0 32m
```Plain
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 1/1 Running 4 (6h26m ago) 6h53m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (6h39m ago) 6h53m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 5h16m 10.244.1.224 node85 <none> <none>
tdengine-3 1/1 Running 0 3m24s 10.244.2.76 node86 <none> <none>
```
此时 POD 的状态仍然是 Running,TDengine 集群中的 dnode 状态要等 POD 状态为 `ready` 之后才能看到:
此时 Pod 的状态仍然是 Running,TDengine 集群中的 dnode 状态要等 Pod 状态为 `ready` 之后才能看到:
```bash
kubectl exec -i -t tdengine-3 -- taos -s "show dnodes"
```Bash
kubectl exec -it tdengine-3 -n tdengine-test -- taos -s "show dnodes"
```
扩容后的四节点 TDengine 集群的 dnode 列表:
```
```Plain
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:14:57.285 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:11.302 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:23.290 | |
4 | tdengine-3.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:33:16.039 | |
Query OK, 4 rows in database (0.008377s)
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
4 | tdengine-3.ta... | 0 | 16 | ready | 2023-07-20 16:01:44.007 | 2023-07-20 16:01:44.889 | | | |
Query OK, 4 row(s) in set (0.003628s)
```
## 集群缩容
由于 TDengine 集群在扩缩容时会对数据进行节点间迁移,使用 kubectl 命令进行缩容需要首先使用 "drop dnodes" 命令节点删除完成后再进行 Kubernetes 集群缩容。
由于 TDengine 集群在扩缩容时会对数据进行节点间迁移,使用 kubectl 命令进行缩容需要首先使用 "drop dnodes" 命令(**如果集群中存在3副本的db,那么缩容后的** **dnode** **个数也要必须大于等于3,否则drop dnode操作会被中止**),然后再节点删除完成后再进行 Kubernetes 集群缩容。
注意:由于 Kubernetes Statefulset 中 Pod 的只能按创建顺序逆序移除,所以 TDengine drop dnode 也需要按照创建顺序逆序移除,否则会导致 Pod 处于错误状态。
```
$ kubectl exec -i -t tdengine-0 -- taos -s "drop dnode 4"
```
```bash
$ kubectl exec -it tdengine-0 -- taos -s "show dnodes"
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "drop dnode 4"
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:14:57.285 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:11.302 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:23.290 | |
Query OK, 3 rows in database (0.004861s)
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
Query OK, 3 row(s) in set (0.003324s)
```
确认移除成功后(使用 kubectl exec -i -t tdengine-0 -- taos -s "show dnodes" 查看和确认 dnode 列表),使用 kubectl 命令移除 POD:
```
kubectl scale statefulsets tdengine --replicas=3
```Plain
kubectl scale statefulsets tdengine --replicas=3 -n tdengine-test
```
最后一个 POD 将会被删除。使用命令 kubectl get pods -l app=tdengine 查看POD状态:
```
$ kubectl get pods -l app=tdengine
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 4m7s
tdengine-1 1/1 Running 0 3m55s
tdengine-2 1/1 Running 0 2m28s
```Plain
kubectl get pod -l app=tdengine -n tdengine-test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 1/1 Running 4 (6h55m ago) 7h22m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (7h9m ago) 7h23m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 5h45m 10.244.1.224 node85 <none> <none>
```
POD删除后,需要手动删除PVC,否则下次扩容时会继续使用以前的数据导致无法正常加入集群。
```bash
$ kubectl delete pvc taosdata-tdengine-3
```Bash
kubectl delete pvc aosdata-tdengine-3 -n tdengine-test
```
此时的集群状态是安全的,需要时还可以再次进行扩容:
```bash
$ kubectl scale statefulsets tdengine --replicas=4
```Bash
kubectl scale statefulsets tdengine --replicas=4 -n tdengine-test
statefulset.apps/tdengine scaled
it@k8s-2:~/TDengine-Operator/src/tdengine$ kubectl get pods -l app=tdengine
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 35m
tdengine-1 1/1 Running 0 34m
tdengine-2 1/1 Running 0 12m
tdengine-3 0/1 ContainerCreating 0 4s
it@k8s-2:~/TDengine-Operator/src/tdengine$ kubectl get pods -l app=tdengine
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 35m
tdengine-1 1/1 Running 0 34m
tdengine-2 1/1 Running 0 12m
tdengine-3 0/1 Running 0 7s
it@k8s-2:~/TDengine-Operator/src/tdengine$ kubectl exec -it tdengine-0 -- taos -s "show dnodes"
kubectl get pod -l app=tdengine -n tdengine-test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 1/1 Running 4 (6h59m ago) 7h27m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (7h13m ago) 7h27m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 5h49m 10.244.1.224 node85 <none> <none>
tdengine-3 1/1 Running 0 20s 10.244.2.77 node86 <none> <none>
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 17:38:49.012 | |
2 | tdengine-1.taosd.default.sv... | 1 | 4 | ready | 2022-07-25 17:39:01.517 | |
5 | tdengine-2.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 18:01:36.479 | |
6 | tdengine-3.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 18:13:54.411 | |
Query OK, 4 row(s) in set (0.001348s)
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
5 | tdengine-3.ta... | 0 | 16 | ready | 2023-07-20 16:31:34.092 | 2023-07-20 16:38:17.419 | | | |
Query OK, 4 row(s) in set (0.003881s)
```
## 清理 TDengine 集群
完整移除 TDengine 集群,需要分别清理 statefulset、svc、configmap、pvc。
> **删除pvc时需要注意下pv persistentVolumeReclaimPolicy策略,建议改为Delete,这样在删除pvc时才会自动清理pv,同时会清理底层的csi存储资源,如果没有配置删除pvc自动清理pv的策略,再删除pvc后,在手动清理pv时,pv对应的csi存储资源可能不会被释放。**
```bash
kubectl delete statefulset -l app=tdengine
kubectl delete svc -l app=tdengine
kubectl delete pvc -l app=tdengine
kubectl delete configmap taoscfg
完整移除 TDengine 集群,需要分别清理 statefulset、svc、configmap、pvc。
```Bash
kubectl delete statefulset -l app=tdengine -n tdengine-test
kubectl delete svc -l app=tdengine -n tdengine-test
kubectl delete pvc -l app=tdengine -n tdengine-test
kubectl delete configmap taoscfg -n tdengine-test
```
## 常见错误
......@@ -330,65 +519,26 @@ kubectl delete configmap taoscfg
未进行 "drop dnode" 直接进行缩容,由于 TDengine 尚未删除节点,缩容 pod 导致 TDengine 集群中部分节点处于 offline 状态。
```
$ kubectl exec -it tdengine-0 -- taos -s "show dnodes"
```Plain
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 17:38:49.012 | |
2 | tdengine-1.taosd.default.sv... | 1 | 4 | ready | 2022-07-25 17:39:01.517 | |
5 | tdengine-2.taosd.default.sv... | 0 | 4 | offline | 2022-07-25 18:01:36.479 | status msg timeout |
6 | tdengine-3.taosd.default.sv... | 0 | 4 | offline | 2022-07-25 18:13:54.411 | status msg timeout |
Query OK, 4 row(s) in set (0.001323s)
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
5 | tdengine-3.ta... | 0 | 16 | offline | 2023-07-20 16:31:34.092 | 2023-07-20 16:38:17.419 | status msg timeout | | |
Query OK, 4 row(s) in set (0.003862s)
```
### 错误二
## 最后
TDengine 集群会持有 replica 参数,如果缩容后的节点数小于这个值,集群将无法使用
对于在 Kubernetes 环境下 TDengine 的高可用和高可靠来说,对于硬件损坏、灾难恢复,分为两个层面来讲
创建一个库使用 replica 参数为 2,插入部分数据:
1. 底层的分布式块存储具备的灾难恢复能力,块存储的多副本,当下流行的分布式块存储如 Ceph,就具备多副本能力,将存储副本扩展到不同的机架、机柜、机房、数据中心(或者直接使用公有云厂商提供的块存储服务)
2. TDengine的灾难恢复,在 TDengine Enterprise 中,本身具备了当一个 dnode 永久下线(物理机磁盘损坏,数据分拣丢失)后,重新拉起一个空白的dnode来恢复原dnode的工作。
```bash
kubectl exec -i -t tdengine-0 -- \
taos -s \
"create database if not exists test replica 2;
use test;
create table if not exists t1(ts timestamp, n int);
insert into t1 values(now, 1)(now+1s, 2);"
```
缩容到单节点:
```bash
kubectl scale statefulsets tdengine --replicas=1
```
在 TDengine CLI 中的所有数据库操作将无法成功。
```
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | |
2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout |
Query OK, 2 row(s) in set (0.000845s)
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | |
2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout |
Query OK, 2 row(s) in set (0.000837s)
最后,欢迎使用[TDengine Cloud](https://cloud.taosdata.com/),来体验一站式全托管的TDengine云服务。
taos> use test;
Database changed.
taos> insert into t1 values(now, 3);
DB error: Unable to resolve FQDN (0.013874s)
```
> TDengine Cloud 是一个极简的全托管时序数据处理云服务平台,它是基于开源的时序数据库 TDengine 而开发的。除高性能的时序数据库之外,它还具有缓存、订阅和流计算等系统功能,而且提供了便利而又安全的数据分享、以及众多的企业级功能。它可以让物联网、工业互联网、金融、IT 运维监控等领域企业在时序数据的管理上大幅降低人力成本和运营成本。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册