未验证 提交 d7bb7e11 编写于 作者: Y Yaqiang Li 提交者: GitHub

Merge pull request #22164 from taosdata/docs/yuanliu/deploy-k8s-TD-25202

docs: optimization k8s deploy
......@@ -4,23 +4,31 @@ sidebar_label: Kubernetes
description: This document describes how to deploy TDengine on Kubernetes.
---
TDengine is a cloud-native time-series database that can be deployed on Kubernetes. This document gives a step-by-step description of how you can use YAML files to create a TDengine cluster and introduces common operations for TDengine in a Kubernetes environment.
## Overview
As a time series database for Cloud Native architecture design, TDengine supports Kubernetes deployment. Firstly we introduce how to use YAML files to create a highly available TDengine cluster from scratch step by step for production usage, and highlight the common operations of TDengine in Kubernetes environment.
To meet [high availability ](https://docs.taosdata.com/tdinternal/high-availability/)requirements, clusters need to meet the following requirements:
- 3 or more dnodes: multiple vnodes in the same vgroup of TDengine are not allowed to be distributed in one dnode at the same time, so if you create a database with 3 replicas, the number of dnodes is greater than or equal to 3
- 3 mnodes: mnode is responsible for the management of the entire TDengine cluster. The default number of mnode in TDengine cluster is only one. If the dnode where the mnode located is dropped, the entire cluster is unavailable.
- Database 3 replicas: The TDengine replica configuration is the database level, so 3 replicas for the database must need three dnodes in the cluster. If any one dnode is offline, does not affect the normal usage of the whole cluster. **If the number of offline** **dnodes** **is 2, then the cluster is not available,** **because** ** the cluster can not complete the election based on RAFT** **.** (Enterprise version: in the disaster recovery scenario, any node data file is damaged, can be restored by pulling up the dnode again)
## Prerequisites
Before deploying TDengine on Kubernetes, perform the following:
* Current steps are compatible with Kubernetes v1.5 and later version.
* Install and configure minikube, kubectl, and helm.
* Install and deploy Kubernetes and ensure that it can be accessed and used normally. Update any container registries or other services as necessary.
- This article applies Kubernetes 1.19 and above
- This article uses the **kubectl** tool to install and deploy, please install the corresponding software in advance
- Kubernetes have been installed and deployed and can access or update the necessary container repositories or other services
You can download the configuration files in this document from [GitHub](https://github.com/taosdata/TDengine-Operator/tree/3.0/src/tdengine).
## Configure the service
Create a service configuration file named `taosd-service.yaml`. Record the value of `metadata.name` (in this example, `taos`) for use in the next step. Add the ports required by TDengine:
Create a service configuration file named `taosd-service.yaml`. Record the value of `metadata.name` (in this example, `taos`) for use in the next step. And then add the ports required by TDengine and record the value of the selector label "app" (in this example, `tdengine`) for use in the next step:
```yaml
```YAML
---
apiVersion: v1
kind: Service
......@@ -31,10 +39,10 @@ metadata:
spec:
ports:
- name: tcp6030
- protocol: "TCP"
protocol: "TCP"
port: 6030
- name: tcp6041
- protocol: "TCP"
protocol: "TCP"
port: 6041
selector:
app: "tdengine"
......@@ -42,10 +50,11 @@ spec:
## Configure the service as StatefulSet
Configure the TDengine service as a StatefulSet.
Create the `tdengine.yaml` file and set `replicas` to 3. In this example, the region is set to Asia/Shanghai and 10 GB of standard storage are allocated per node. You can change the configuration based on your environment and business requirements.
According to Kubernetes instructions for various deployments, we will use StatefulSet as the deployment resource type of TDengine. Create the file `tdengine.yaml `, where replicas defines the number of cluster nodes as 3. The node time zone is China (Asia/Shanghai), and each node is allocated 5G standard storage (refer to the [Storage Classes ](https://kubernetes.io/docs/concepts/storage/storage-classes/)configuration storage class). You can also modify accordingly according to the actual situation.
Please pay special attention to the startupProbe configuration. If dnode's Pod drops for a period of time and then restart, the newly launched dnode Pod will be temporarily unavailable. The reason is the startupProbe configuration is too small, Kubernetes will know that the Pod is in an abnormal state and try to restart it, then the dnode's Pod will restart frequently and never return to the normal status. Refer to [Configure Liveness, Readiness and Startup Probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
```yaml
```YAML
---
apiVersion: apps/v1
kind: StatefulSet
......@@ -69,14 +78,14 @@ spec:
spec:
containers:
- name: "tdengine"
image: "tdengine/tdengine:3.0.0.0"
image: "tdengine/tdengine:3.0.7.1"
imagePullPolicy: "IfNotPresent"
ports:
- name: tcp6030
- protocol: "TCP"
protocol: "TCP"
containerPort: 6030
- name: tcp6041
- protocol: "TCP"
protocol: "TCP"
containerPort: 6041
env:
# POD_NAME for FQDN config
......@@ -102,12 +111,18 @@ spec:
# Must set if you want a cluster.
- name: TAOS_FIRST_EP
value: "$(STS_NAME)-0.$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local:$(TAOS_SERVER_PORT)"
# TAOS_FQDN should always be set in k8s env.
# TAOS_FQND should always be set in k8s env.
- name: TAOS_FQDN
value: "$(POD_NAME).$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local"
volumeMounts:
- name: taosdata
mountPath: /var/lib/taos
startupProbe:
exec:
command:
- taos-check
failureThreshold: 360
periodSeconds: 10
readinessProbe:
exec:
command:
......@@ -129,266 +144,401 @@ spec:
storageClassName: "standard"
resources:
requests:
storage: "10Gi"
storage: "5Gi"
```
## Use kubectl to deploy TDengine
Run the following commands:
First create the corresponding namespace, and then execute the following command in sequence :
```bash
kubectl apply -f taosd-service.yaml
kubectl apply -f tdengine.yaml
```Bash
kubectl apply -f taosd-service.yaml -n tdengine-test
kubectl apply -f tdengine.yaml -n tdengine-test
```
The preceding configuration generates a TDengine cluster with three nodes in which dnodes are automatically configured. You can run the `show dnodes` command to query the nodes in the cluster:
The above configuration will generate a three-node TDengine cluster, dnode is automatically configured, you can use the **show dnodes** command to view the nodes of the current cluster:
```bash
kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-1 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-2 -- taos -s "show dnodes"
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show dnodes"
kubectl exec -it tdengine-2 -n tdengine-test -- taos -s "show dnodes"
```
The output is as follows:
```
```Bash
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:14:57.285 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:11.302 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:23.290 | |
Query OK, 3 rows in database (0.003655s)
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 0 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-19 17:54:18.469 | | | |
2 | tdengine-1.ta... | 0 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-19 17:54:38.698 | | | |
3 | tdengine-2.ta... | 0 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-19 17:55:02.039 | | | |
Query OK, 3 row(s) in set (0.001853s)
```
View the current mnode
```Bash
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show mnodes\G"
taos> show mnodes\G
*************************** 1.row ***************************
id: 1
endpoint: tdengine-0.taosd.tdengine-test.svc.cluster.local:6030
role: leader
status: ready
create_time: 2023-07-19 17:54:18.559
reboot_time: 2023-07-19 17:54:19.520
Query OK, 1 row(s) in set (0.001282s)
```
## Create mnode
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "create mnode on dnode 2"
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "create mnode on dnode 3"
```
View mnode
```Bash
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show mnodes\G"
taos> show mnodes\G
*************************** 1.row ***************************
id: 1
endpoint: tdengine-0.taosd.tdengine-test.svc.cluster.local:6030
role: leader
status: ready
create_time: 2023-07-19 17:54:18.559
reboot_time: 2023-07-20 09:19:36.060
*************************** 2.row ***************************
id: 2
endpoint: tdengine-1.taosd.tdengine-test.svc.cluster.local:6030
role: follower
status: ready
create_time: 2023-07-20 09:22:05.600
reboot_time: 2023-07-20 09:22:12.838
*************************** 3.row ***************************
id: 3
endpoint: tdengine-2.taosd.tdengine-test.svc.cluster.local:6030
role: follower
status: ready
create_time: 2023-07-20 09:22:20.042
reboot_time: 2023-07-20 09:22:23.271
Query OK, 3 row(s) in set (0.003108s)
```
## Enable port forwarding
The kubectl port forwarding feature allows applications to access the TDengine cluster running on Kubernetes.
Kubectl port forwarding enables applications to access TDengine clusters running in Kubernetes environments.
```
kubectl port-forward tdengine-0 6041:6041 &
```bash
kubectl port-forward -n tdengine-test tdengine-0 6041:6041 &
```
Use curl to verify that the TDengine REST API is working on port 6041:
Use **curl** to verify that the TDengine REST API is working on port 6041:
```
$ curl -u root:taosdata -d "show databases" 127.0.0.1:6041/rest/sql
Handling connection for 6041
{"code":0,"column_meta":[["name","VARCHAR",64],["create_time","TIMESTAMP",8],["vgroups","SMALLINT",2],["ntables","BIGINT",8],["replica","TINYINT",1],["strict","VARCHAR",4],["duration","VARCHAR",10],["keep","VARCHAR",32],["buffer","INT",4],["pagesize","INT",4],["pages","INT",4],["minrows","INT",4],["maxrows","INT",4],["comp","TINYINT",1],["precision","VARCHAR",2],["status","VARCHAR",10],["retention","VARCHAR",60],["single_stable","BOOL",1],["cachemodel","VARCHAR",11],["cachesize","INT",4],["wal_level","TINYINT",1],["wal_fsync_period","INT",4],["wal_retention_period","INT",4],["wal_retention_size","BIGINT",8],["wal_roll_period","INT",4],["wal_segment_size","BIGINT",8]],"data":[["information_schema",null,null,16,null,null,null,null,null,null,null,null,null,null,null,"ready",null,null,null,null,null,null,null,null,null,null],["performance_schema",null,null,10,null,null,null,null,null,null,null,null,null,null,null,"ready",null,null,null,null,null,null,null,null,null,null]],"rows":2}
```bash
curl -u root:taosdata -d "show databases" 127.0.0.1:6041/rest/sql
{"code":0,"column_meta":[["name","VARCHAR",64]],"data":[["information_schema"],["performance_schema"],["test"],["test1"]],"rows":4}
```
## Enable the dashboard for visualization
## Test cluster
The minikube dashboard command enables visualized cluster management.
### Data preparation
```
$ minikube dashboard
* Verifying dashboard health ...
* Launching proxy ...
* Verifying proxy health ...
* Opening http://127.0.0.1:46617/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/ in your default browser...
http://127.0.0.1:46617/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/
```
#### taosBenchmark
In some public clouds, minikube cannot be remotely accessed if it is bound to 127.0.0.1. In this case, use the kubectl proxy command to map the port to 0.0.0.0. Then, you can access the dashboard by using a web browser to open the dashboard URL above on the public IP address and port of the virtual machine.
Create a 3 replicas database with taosBenchmark, write 100 million data at the same time, and view the data at the same time
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taosBenchmark -I stmt -d test -n 10000 -t 10000 -a 3
# query data
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "select count(*) from test.meters;"
taos> select count(*) from test.meters;
count(*) |
========================
100000000 |
Query OK, 1 row(s) in set (0.103537s)
```
$ kubectl proxy --accept-hosts='^.*$' --address='0.0.0.0'
```
View vnode distribution by showing dnodes
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 8 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-19 17:54:18.469 | | | |
2 | tdengine-1.ta... | 8 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-19 17:54:38.698 | | | |
3 | tdengine-2.ta... | 8 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-19 17:55:02.039 | | | |
Query OK, 3 row(s) in set (0.001357s)
```
View xnode distribution by showing vgroup
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show test.vgroups"
taos> show test.vgroups
vgroup_id | db_name | tables | v1_dnode | v1_status | v2_dnode | v2_status | v3_dnode | v3_status | v4_dnode | v4_status | cacheload | cacheelements | tsma |
==============================================================================================================================================================================================
2 | test | 1267 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
3 | test | 1215 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
4 | test | 1215 | 1 | leader | 2 | follower | 3 | follower | NULL | NULL | 0 | 0 | 0 |
5 | test | 1307 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
6 | test | 1245 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
7 | test | 1275 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
8 | test | 1231 | 1 | leader | 2 | follower | 3 | follower | NULL | NULL | 0 | 0 | 0 |
9 | test | 1245 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
Query OK, 8 row(s) in set (0.001488s)
```
#### Manually created
Common a three-copy test1, and create a table, write 2 pieces of data
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- \
taos -s \
"create database if not exists test1 replica 3;
use test1;
create table if not exists t1(ts timestamp, n int);
insert into t1 values(now, 1)(now+1s, 2);"
```
View xnode distribution by showing test1.vgroup
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show test1.vgroups"
taos> show test1.vgroups
vgroup_id | db_name | tables | v1_dnode | v1_status | v2_dnode | v2_status | v3_dnode | v3_status | v4_dnode | v4_status | cacheload | cacheelements | tsma |
==============================================================================================================================================================================================
10 | test1 | 1 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
11 | test1 | 0 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
Query OK, 2 row(s) in set (0.001489s)
```
### Test fault tolerance
The dnode where the mnode leader is located is disconnected, dnode1
```Bash
kubectl get pod -l app=tdengine -n tdengine-test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 0/1 ErrImagePull 2 (2s ago) 20m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (6m48s ago) 20m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 21m 10.244.1.223 node85 <none> <none>
```
At this time, the cluster mnode has a re-election, and the monde on dnode1 becomes the leader.
```Bash
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show mnodes\G"
Welcome to the TDengine Command Line Interface, Client Version:3.0.7.1.202307190706
Copyright (c) 2022 by TDengine, all rights reserved.
taos> show mnodes\G
*************************** 1.row ***************************
id: 1
endpoint: tdengine-0.taosd.tdengine-test.svc.cluster.local:6030
role: offline
status: offline
create_time: 2023-07-19 17:54:18.559
reboot_time: 1970-01-01 08:00:00.000
*************************** 2.row ***************************
id: 2
endpoint: tdengine-1.taosd.tdengine-test.svc.cluster.local:6030
role: leader
status: ready
create_time: 2023-07-20 09:22:05.600
reboot_time: 2023-07-20 09:32:00.227
*************************** 3.row ***************************
id: 3
endpoint: tdengine-2.taosd.tdengine-test.svc.cluster.local:6030
role: follower
status: ready
create_time: 2023-07-20 09:22:20.042
reboot_time: 2023-07-20 09:32:00.026
Query OK, 3 row(s) in set (0.001513s)
```
Cluster can read and write normally
```Bash
# insert
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "insert into test1.t1 values(now, 1)(now+1s, 2);"
taos> insert into test1.t1 values(now, 1)(now+1s, 2);
Insert OK, 2 row(s) affected (0.002098s)
# select
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "select *from test1.t1"
taos> select *from test1.t1
ts | n |
========================================
2023-07-19 18:04:58.104 | 1 |
2023-07-19 18:04:59.104 | 2 |
2023-07-19 18:06:00.303 | 1 |
2023-07-19 18:06:01.303 | 2 |
Query OK, 4 row(s) in set (0.001994s)
```
Similarly, as for the non-leader mnode dropped, read and write can of course be normal, here will not do too much display .
## Scaling Out Your Cluster
TDengine clusters can scale automatically:
TDengine cluster supports automatic expansion:
```bash
```Bash
kubectl scale statefulsets tdengine --replicas=4
```
The preceding command increases the number of replicas to 4. After running this command, query the pod status:
The parameter `--replica = 4 `in the above command line indicates that you want to expand the TDengine cluster to 4 nodes. After execution, first check the status of the Pod:
```bash
kubectl get pods -l app=tdengine
```Bash
kubectl get pod -l app=tdengine -n tdengine-test -o wide
```
The output is as follows:
```
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 161m
tdengine-1 1/1 Running 0 161m
tdengine-2 1/1 Running 0 32m
tdengine-3 1/1 Running 0 32m
```Plain
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 1/1 Running 4 (6h26m ago) 6h53m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (6h39m ago) 6h53m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 5h16m 10.244.1.224 node85 <none> <none>
tdengine-3 1/1 Running 0 3m24s 10.244.2.76 node86 <none> <none>
```
The status of all pods is Running. Once the pod status changes to Ready, you can check the dnode status:
At this time, the state of the POD is still Running, and the dnode state in the TDengine cluster can only be seen after the Pod status is `ready `:
```bash
kubectl exec -i -t tdengine-3 -- taos -s "show dnodes"
```Bash
kubectl exec -it tdengine-3 -n tdengine-test -- taos -s "show dnodes"
```
The following output shows that the TDengine cluster has been expanded to 4 replicas:
The dnode list of the expanded four-node TDengine cluster:
```
```Plain
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:14:57.285 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:11.302 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:23.290 | |
4 | tdengine-3.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:33:16.039 | |
Query OK, 4 rows in database (0.008377s)
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
4 | tdengine-3.ta... | 0 | 16 | ready | 2023-07-20 16:01:44.007 | 2023-07-20 16:01:44.889 | | | |
Query OK, 4 row(s) in set (0.003628s)
```
## Scaling In Your Cluster
When you scale in a TDengine cluster, your data is migrated to different nodes. You must run the drop dnodes command in TDengine to remove dnodes before scaling in your Kubernetes environment.
Note: In a Kubernetes StatefulSet service, the newest pods are always removed first. For this reason, when you scale in your TDengine cluster, ensure that you drop the newest dnodes.
Since the TDengine cluster will migrate data between nodes during volume expansion and contraction, using the **kubectl** command to reduce the volume requires first using the "drop dnodes" command ( **If there are 3 replicas of db in the cluster, the number of dnodes after reduction must also be greater than or equal to 3, otherwise the drop dnode operation will be aborted** ), the node deletion is completed before Kubernetes cluster reduction.
```
$ kubectl exec -i -t tdengine-0 -- taos -s "drop dnode 4"
```
Note: Since Kubernetes Pods in the Statefulset can only be removed in reverse order of creation, the TDengine drop dnode also needs to be removed in reverse order of creation, otherwise the Pod will be in an error state.
```bash
$ kubectl exec -it tdengine-0 -- taos -s "show dnodes"
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "drop dnode 4"
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:14:57.285 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:11.302 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:23.290 | |
Query OK, 3 rows in database (0.004861s)
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
Query OK, 3 row(s) in set (0.003324s)
```
Verify that the dnode have been successfully removed by running the `kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"` command. Then run the following command to remove the pod:
After confirming that the removal is successful (use kubectl exec -i -t tdengine-0 --taos -s "show dnodes" to view and confirm the dnode list), use the kubectl command to remove the Pod:
```
kubectl scale statefulsets tdengine --replicas=3
```Plain
kubectl scale statefulsets tdengine --replicas=3 -n tdengine-test
```
The newest pod in the deployment is removed. Run the `kubectl get pods -l app=tdengine` command to query the pod status:
The last Pod will be deleted. Use the command kubectl get pods -l app = tdengine to check the Pod status:
```
$ kubectl get pods -l app=tdengine
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 4m7s
tdengine-1 1/1 Running 0 3m55s
tdengine-2 1/1 Running 0 2m28s
```Plain
kubectl get pod -l app=tdengine -n tdengine-test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 1/1 Running 4 (6h55m ago) 7h22m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (7h9m ago) 7h23m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 5h45m 10.244.1.224 node85 <none> <none>
```
After the pod has been removed, manually delete the PersistentVolumeClaim (PVC). Otherwise, future scale-outs will attempt to use existing data.
After the Pod is deleted, the PVC needs to be deleted manually, otherwise the previous data will continue to be used for the next expansion, resulting in the inability to join the cluster normally.
```bash
$ kubectl delete pvc taosdata-tdengine-3
```Bash
kubectl delete pvc aosdata-tdengine-3 -n tdengine-test
```
Your cluster has now been safely scaled in, and you can scale it out again as necessary.
The cluster state at this time is safe and can be scaled up again if needed.
```bash
$ kubectl scale statefulsets tdengine --replicas=4
```Bash
kubectl scale statefulsets tdengine --replicas=4 -n tdengine-test
statefulset.apps/tdengine scaled
it@k8s-2:~/TDengine-Operator/src/tdengine$ kubectl get pods -l app=tdengine
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 35m
tdengine-1 1/1 Running 0 34m
tdengine-2 1/1 Running 0 12m
tdengine-3 0/1 ContainerCreating 0 4s
it@k8s-2:~/TDengine-Operator/src/tdengine$ kubectl get pods -l app=tdengine
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 35m
tdengine-1 1/1 Running 0 34m
tdengine-2 1/1 Running 0 12m
tdengine-3 0/1 Running 0 7s
it@k8s-2:~/TDengine-Operator/src/tdengine$ kubectl exec -it tdengine-0 -- taos -s "show dnodes"
kubectl get pod -l app=tdengine -n tdengine-test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 1/1 Running 4 (6h59m ago) 7h27m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (7h13m ago) 7h27m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 5h49m 10.244.1.224 node85 <none> <none>
tdengine-3 1/1 Running 0 20s 10.244.2.77 node86 <none> <none>
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 17:38:49.012 | |
2 | tdengine-1.taosd.default.sv... | 1 | 4 | ready | 2022-07-25 17:39:01.517 | |
5 | tdengine-2.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 18:01:36.479 | |
6 | tdengine-3.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 18:13:54.411 | |
Query OK, 4 row(s) in set (0.001348s)
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
5 | tdengine-3.ta... | 0 | 16 | ready | 2023-07-20 16:31:34.092 | 2023-07-20 16:38:17.419 | | | |
Query OK, 4 row(s) in set (0.003881s)
```
## Remove a TDengine Cluster
To fully remove a TDengine cluster, you must delete its statefulset, svc, configmap, and pvc entries:
> **When deleting the PVC, you need to pay attention to the pv persistentVolumeReclaimPolicy policy. It is recommended to change to Delete, so that the PV will be automatically cleaned up when the PVC is deleted, and the underlying CSI storage resources will be cleaned up at the same time. If the policy of deleting the PVC to automatically clean up the PV is not configured, and then after deleting the pvc, when manually cleaning up the PV, the CSI storage resources corresponding to the PV may not be released.**
```bash
kubectl delete statefulset -l app=tdengine
kubectl delete svc -l app=tdengine
kubectl delete pvc -l app=tdengine
kubectl delete configmap taoscfg
Complete removal of TDengine cluster, need to clean up statefulset, svc, configmap, pvc respectively.
```Bash
kubectl delete statefulset -l app=tdengine -n tdengine-test
kubectl delete svc -l app=tdengine -n tdengine-test
kubectl delete pvc -l app=tdengine -n tdengine-test
kubectl delete configmap taoscfg -n tdengine-test
```
## Troubleshooting
### Error 1
If you remove a pod without first running `drop dnode`, some TDengine nodes will go offline.
No "drop dnode" is directly reduced. Since the TDengine has not deleted the node, the reduced pod causes some nodes in the TDengine cluster to be offline.
```
$ kubectl exec -it tdengine-0 -- taos -s "show dnodes"
```Plain
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 17:38:49.012 | |
2 | tdengine-1.taosd.default.sv... | 1 | 4 | ready | 2022-07-25 17:39:01.517 | |
5 | tdengine-2.taosd.default.sv... | 0 | 4 | offline | 2022-07-25 18:01:36.479 | status msg timeout |
6 | tdengine-3.taosd.default.sv... | 0 | 4 | offline | 2022-07-25 18:13:54.411 | status msg timeout |
Query OK, 4 row(s) in set (0.001323s)
```
### Error 2
If the number of nodes after a scale-in is less than the value of the replica parameter, the cluster will go down:
Create a database with replica set to 2 and add data.
```bash
kubectl exec -i -t tdengine-0 -- \
taos -s \
"create database if not exists test replica 2;
use test;
create table if not exists t1(ts timestamp, n int);
insert into t1 values(now, 1)(now+1s, 2);"
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
5 | tdengine-3.ta... | 0 | 16 | offline | 2023-07-20 16:31:34.092 | 2023-07-20 16:38:17.419 | status msg timeout | | |
Query OK, 4 row(s) in set (0.003862s)
```
Scale in to one node:
## Finally
```bash
kubectl scale statefulsets tdengine --replicas=1
For the high availability and high reliability of TDengine in a Kubernetes environment, hardware damage and disaster recovery are divided into two levels:
```
1. The disaster recovery capability of the underlying distributed Block Storage, the multi-copy of Block Storage, the current popular distributed Block Storage such as Ceph, has the multi-copy capability, extending the storage copy to different racks, cabinets, computer rooms, Data center (or directly use the Block Storage service provided by Public Cloud vendors)
2. TDengine disaster recovery, in TDengine Enterprise, itself has when a dnode permanently offline (TCE-metal disk damage, data sorting loss), re-pull a blank dnode to restore the original dnode work.
In the TDengine CLI, you can see that no database operations succeed:
Finally, welcome to [TDengine Cloud ](https://cloud.tdengine.com/)to experience the one-stop fully managed TDengine Cloud as a Service.
```
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | |
2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout |
Query OK, 2 row(s) in set (0.000845s)
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | |
2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout |
Query OK, 2 row(s) in set (0.000837s)
taos> use test;
Database changed.
taos> insert into t1 values(now, 3);
DB error: Unable to resolve FQDN (0.013874s)
```
> TDengine Cloud is a minimalist fully managed time series data processing Cloud as a Service platform developed based on the open source time series database TDengine. In addition to high-performance time series database, it also has system functions such as caching, subscription and stream computing, and provides convenient and secure data sharing, as well as numerous enterprise-level functions. It allows enterprises in the fields of Internet of Things, Industrial Internet, Finance, IT operation and maintenance monitoring to significantly reduce labor costs and operating costs in the management of time series data.
......@@ -4,23 +4,31 @@ title: 在 Kubernetes 上部署 TDengine 集群
description: 利用 Kubernetes 部署 TDengine 集群的详细指南
---
作为面向云原生架构设计的时序数据库,TDengine 支持 Kubernetes 部署。这里介绍如何使用 YAML 文件一步一步从头创建一个 TDengine 集群,并重点介绍 Kubernetes 环境下 TDengine 的常用操作。
## 概述
作为面向云原生架构设计的时序数据库,TDengine 本身就支持 Kubernetes 部署。这里介绍如何使用 YAML 文件从头一步一步创建一个可用于生产使用的高可用 TDengine 集群,并重点介绍 Kubernetes 环境下 TDengine 的常用操作。
为了满足[高可用](https://docs.taosdata.com/tdinternal/high-availability/)的需求,集群需要满足如下要求:
- 3个及以上 dnode :TDengine 的同一个 vgroup 中的多个 vnode ,不允许同时分布在一个 dnode ,所以如果创建3副本的数据库,则 dnode 数大于等于3
- 3个 mnode :mnode 负责整个集群的管理工作,TDengine 默认是一个 mnode。如果这个 mnode 所在的 dnode 掉线,则整个集群不可用。
- 数据库的3副本:TDengine 的副本配置是数据库级别,所以数据库3副本可满足在3个 dnode 的集群中,任意一个 dnode 下线,都不影响集群的正常使用。**如果下线** **dnode** **个数为2时,此时集群不可用,****因为****RAFT无法完成选举****。**(企业版:在灾难恢复场景,任一节点数据文件损坏,都可以通过重新拉起dnode进行恢复)
## 前置条件
要使用 Kubernetes 部署管理 TDengine 集群,需要做好如下准备工作。
* 本文适用 Kubernetes v1.5 以上版本
* 本文和下一章使用 minikube、kubectl 和 helm 等工具进行安装部署,请提前安装好相应软件
* Kubernetes 已经安装部署并能正常访问使用或更新必要的容器仓库或其他服务
- 本文适用 Kubernetes v1.19 以上版本
- 本文使用 kubectl 工具进行安装部署,请提前安装好相应软件
- Kubernetes 已经安装部署并能正常访问使用或更新必要的容器仓库或其他服务
以下配置文件也可以从 [GitHub 仓库](https://github.com/taosdata/TDengine-Operator/tree/3.0/src/tdengine) 下载。
## 配置 Service 服务
创建一个 Service 配置文件:`taosd-service.yaml`,服务名称 `metadata.name` (此处为 "taosd") 将在下一步中使用到。添加 TDengine 所用到的端口:
创建一个 Service 配置文件:`taosd-service.yaml`,服务名称 `metadata.name` (此处为 "taosd") 将在下一步中使用到。首先添加 TDengine 所用到的端口,然后在选择器设置确定的标签 app (此处为 “tdengine”)。
```yaml
```YAML
---
apiVersion: v1
kind: Service
......@@ -42,10 +50,11 @@ spec:
## 有状态服务 StatefulSet
根据 Kubernetes 对各类部署的说明,我们将使用 StatefulSet 作为 TDengine 的服务类型。
创建文件 `tdengine.yaml`,其中 replicas 定义集群节点的数量为 3。节点时区为中国(Asia/Shanghai),每个节点分配 10G 标准(standard)存储。你也可以根据实际情况进行相应修改。
根据 Kubernetes 对各类部署的说明,我们将使用 StatefulSet 作为 TDengine 的部署资源类型。 创建文件 `tdengine.yaml`,其中 replicas 定义集群节点的数量为 3。节点时区为中国(Asia/Shanghai),每个节点分配 5G 标准(standard)存储(参考[Storage Classes](https://kubernetes.io/docs/concepts/storage/storage-classes/) 配置 storage class )。你也可以根据实际情况进行相应修改。
请特别注意startupProbe的配置,在 dnode 的 Pod 掉线一段时间后,再重新启动,这个时候新上线的 dnode 会短暂不可用。如果startupProbe配置过小,Kubernetes 会认为该 Pod 处于不正常的状态,并尝试重启该 Pod,该 dnode 的 Pod 会频繁重启,始终无法恢复到正常状态。参考 [Configure Liveness, Readiness and Startup Probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
```yaml
```YAML
---
apiVersion: apps/v1
kind: StatefulSet
......@@ -69,7 +78,7 @@ spec:
spec:
containers:
- name: "tdengine"
image: "tdengine/tdengine:3.0.0.0"
image: "tdengine/tdengine:3.0.7.1"
imagePullPolicy: "IfNotPresent"
ports:
- name: tcp6030
......@@ -108,6 +117,12 @@ spec:
volumeMounts:
- name: taosdata
mountPath: /var/lib/taos
startupProbe:
exec:
command:
- taos-check
failureThreshold: 360
periodSeconds: 10
readinessProbe:
exec:
command:
......@@ -129,199 +144,373 @@ spec:
storageClassName: "standard"
resources:
requests:
storage: "10Gi"
storage: "5Gi"
```
## 使用 kubectl 命令部署 TDengine 集群
顺序执行以下命令。
首先创建对应的 namespace,然后顺序执行以下命令:
```bash
kubectl apply -f taosd-service.yaml
kubectl apply -f tdengine.yaml
```Bash
kubectl apply -f taosd-service.yaml -n tdengine-test
kubectl apply -f tdengine.yaml -n tdengine-test
```
上面的配置将生成一个三节点的 TDengine 集群,dnode 为自动配置,可以使用 show dnodes 命令查看当前集群的节点:
```bash
kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-1 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-2 -- taos -s "show dnodes"
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show dnodes"
kubectl exec -it tdengine-2 -n tdengine-test -- taos -s "show dnodes"
```
输出如下:
```
```Bash
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:14:57.285 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:11.302 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:23.290 | |
Query OK, 3 rows in database (0.003655s)
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 0 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-19 17:54:18.469 | | | |
2 | tdengine-1.ta... | 0 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-19 17:54:38.698 | | | |
3 | tdengine-2.ta... | 0 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-19 17:55:02.039 | | | |
Query OK, 3 row(s) in set (0.001853s)
```
查看当前mnode
```Bash
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show mnodes\G"
taos> show mnodes\G
*************************** 1.row ***************************
id: 1
endpoint: tdengine-0.taosd.tdengine-test.svc.cluster.local:6030
role: leader
status: ready
create_time: 2023-07-19 17:54:18.559
reboot_time: 2023-07-19 17:54:19.520
Query OK, 1 row(s) in set (0.001282s)
```
## 创建mnode
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "create mnode on dnode 2"
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "create mnode on dnode 3"
```
查看mnode
```Bash
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show mnodes\G"
taos> show mnodes\G
*************************** 1.row ***************************
id: 1
endpoint: tdengine-0.taosd.tdengine-test.svc.cluster.local:6030
role: leader
status: ready
create_time: 2023-07-19 17:54:18.559
reboot_time: 2023-07-20 09:19:36.060
*************************** 2.row ***************************
id: 2
endpoint: tdengine-1.taosd.tdengine-test.svc.cluster.local:6030
role: follower
status: ready
create_time: 2023-07-20 09:22:05.600
reboot_time: 2023-07-20 09:22:12.838
*************************** 3.row ***************************
id: 3
endpoint: tdengine-2.taosd.tdengine-test.svc.cluster.local:6030
role: follower
status: ready
create_time: 2023-07-20 09:22:20.042
reboot_time: 2023-07-20 09:22:23.271
Query OK, 3 row(s) in set (0.003108s)
```
## 使能端口转发
利用 kubectl 端口转发功能可以使应用可以访问 Kubernetes 环境运行的 TDengine 集群。
```
kubectl port-forward tdengine-0 6041:6041 &
```Plain
kubectl port-forward -n tdengine-test tdengine-0 6041:6041 &
```
使用 curl 命令验证 TDengine REST API 使用的 6041 接口。
```
$ curl -u root:taosdata -d "show databases" 127.0.0.1:6041/rest/sql
Handling connection for 6041
{"code":0,"column_meta":[["name","VARCHAR",64],["create_time","TIMESTAMP",8],["vgroups","SMALLINT",2],["ntables","BIGINT",8],["replica","TINYINT",1],["strict","VARCHAR",4],["duration","VARCHAR",10],["keep","VARCHAR",32],["buffer","INT",4],["pagesize","INT",4],["pages","INT",4],["minrows","INT",4],["maxrows","INT",4],["comp","TINYINT",1],["precision","VARCHAR",2],["status","VARCHAR",10],["retention","VARCHAR",60],["single_stable","BOOL",1],["cachemodel","VARCHAR",11],["cachesize","INT",4],["wal_level","TINYINT",1],["wal_fsync_period","INT",4],["wal_retention_period","INT",4],["wal_retention_size","BIGINT",8],["wal_roll_period","INT",4],["wal_segment_size","BIGINT",8]],"data":[["information_schema",null,null,16,null,null,null,null,null,null,null,null,null,null,null,"ready",null,null,null,null,null,null,null,null,null,null],["performance_schema",null,null,10,null,null,null,null,null,null,null,null,null,null,null,"ready",null,null,null,null,null,null,null,null,null,null]],"rows":2}
```Plain
curl -u root:taosdata -d "show databases" 127.0.0.1:6041/rest/sql
{"code":0,"column_meta":[["name","VARCHAR",64]],"data":[["information_schema"],["performance_schema"],["test"],["test1"]],"rows":4}
```
## 使用 dashboard 进行图形化管理
## 集群测试
minikube 提供 dashboard 命令支持图形化管理界面。
### 数据准备
```
$ minikube dashboard
* Verifying dashboard health ...
* Launching proxy ...
* Verifying proxy health ...
* Opening http://127.0.0.1:46617/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/ in your default browser...
http://127.0.0.1:46617/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/
```
#### taosBenchmark
对于某些公有云环境,minikube 绑定在 127.0.0.1 IP 地址上无法通过远程访问,需要使用 kubectl proxy 命令将端口映射到 0.0.0.0 IP 地址上,再通过浏览器访问虚拟机公网 IP 和端口以及相同的 dashboard URL 路径即可远程访问 dashboard。
通过taosBenchmark 创建一个3副本的数据库,同时写入1亿条数据,同时查看数据
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taosBenchmark -I stmt -d test -n 10000 -t 10000 -a 3
# query data
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "select count(*) from test.meters;"
taos> select count(*) from test.meters;
count(*) |
========================
100000000 |
Query OK, 1 row(s) in set (0.103537s)
```
$ kubectl proxy --accept-hosts='^.*$' --address='0.0.0.0'
```
查看vnode分布,通过show dnodes
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 8 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-19 17:54:18.469 | | | |
2 | tdengine-1.ta... | 8 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-19 17:54:38.698 | | | |
3 | tdengine-2.ta... | 8 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-19 17:55:02.039 | | | |
Query OK, 3 row(s) in set (0.001357s)
```
通过show vgroup 查看 vnode 分布情况
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show test.vgroups"
taos> show test.vgroups
vgroup_id | db_name | tables | v1_dnode | v1_status | v2_dnode | v2_status | v3_dnode | v3_status | v4_dnode | v4_status | cacheload | cacheelements | tsma |
==============================================================================================================================================================================================
2 | test | 1267 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
3 | test | 1215 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
4 | test | 1215 | 1 | leader | 2 | follower | 3 | follower | NULL | NULL | 0 | 0 | 0 |
5 | test | 1307 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
6 | test | 1245 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
7 | test | 1275 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
8 | test | 1231 | 1 | leader | 2 | follower | 3 | follower | NULL | NULL | 0 | 0 | 0 |
9 | test | 1245 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
Query OK, 8 row(s) in set (0.001488s)
```
#### 手工创建
常见一个三副本的test1,并创建一张表,写入2条数据
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- \
taos -s \
"create database if not exists test1 replica 3;
use test1;
create table if not exists t1(ts timestamp, n int);
insert into t1 values(now, 1)(now+1s, 2);"
```
通过show test1.vgroup 查看xnode分布情况
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show test1.vgroups"
taos> show test1.vgroups
vgroup_id | db_name | tables | v1_dnode | v1_status | v2_dnode | v2_status | v3_dnode | v3_status | v4_dnode | v4_status | cacheload | cacheelements | tsma |
==============================================================================================================================================================================================
10 | test1 | 1 | 1 | follower | 2 | follower | 3 | leader | NULL | NULL | 0 | 0 | 0 |
11 | test1 | 0 | 1 | follower | 2 | leader | 3 | follower | NULL | NULL | 0 | 0 | 0 |
Query OK, 2 row(s) in set (0.001489s)
```
### 容错测试
Mnode leader 所在的 dnode 掉线,dnode1
```Bash
kubectl get pod -l app=tdengine -n tdengine-test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 0/1 ErrImagePull 2 (2s ago) 20m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (6m48s ago) 20m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 21m 10.244.1.223 node85 <none> <none>
```
此时集群mnode发生重新选举,dnode1上的monde 成为leader
```Bash
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show mnodes\G"
Welcome to the TDengine Command Line Interface, Client Version:3.0.7.1.202307190706
Copyright (c) 2022 by TDengine, all rights reserved.
taos> show mnodes\G
*************************** 1.row ***************************
id: 1
endpoint: tdengine-0.taosd.tdengine-test.svc.cluster.local:6030
role: offline
status: offline
create_time: 2023-07-19 17:54:18.559
reboot_time: 1970-01-01 08:00:00.000
*************************** 2.row ***************************
id: 2
endpoint: tdengine-1.taosd.tdengine-test.svc.cluster.local:6030
role: leader
status: ready
create_time: 2023-07-20 09:22:05.600
reboot_time: 2023-07-20 09:32:00.227
*************************** 3.row ***************************
id: 3
endpoint: tdengine-2.taosd.tdengine-test.svc.cluster.local:6030
role: follower
status: ready
create_time: 2023-07-20 09:22:20.042
reboot_time: 2023-07-20 09:32:00.026
Query OK, 3 row(s) in set (0.001513s)
```
集群可以正常读写
```Bash
# insert
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "insert into test1.t1 values(now, 1)(now+1s, 2);"
taos> insert into test1.t1 values(now, 1)(now+1s, 2);
Insert OK, 2 row(s) affected (0.002098s)
# select
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "select *from test1.t1"
taos> select *from test1.t1
ts | n |
========================================
2023-07-19 18:04:58.104 | 1 |
2023-07-19 18:04:59.104 | 2 |
2023-07-19 18:06:00.303 | 1 |
2023-07-19 18:06:01.303 | 2 |
Query OK, 4 row(s) in set (0.001994s)
```
同理,至于非leader得mnode掉线,读写当然可以正常进行,这里就不做过多的展示。
## 集群扩容
TDengine 集群支持自动扩容:
```bash
```Bash
kubectl scale statefulsets tdengine --replicas=4
```
上面命令行中参数 `--replica=4` 表示要将 TDengine 集群扩容到 4 个节点,执行后首先检查 POD 的状态:
```bash
kubectl get pods -l app=tdengine
```Bash
kubectl get pod -l app=tdengine -n tdengine-test -o wide
```
输出如下:
```
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 161m
tdengine-1 1/1 Running 0 161m
tdengine-2 1/1 Running 0 32m
tdengine-3 1/1 Running 0 32m
```Plain
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 1/1 Running 4 (6h26m ago) 6h53m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (6h39m ago) 6h53m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 5h16m 10.244.1.224 node85 <none> <none>
tdengine-3 1/1 Running 0 3m24s 10.244.2.76 node86 <none> <none>
```
此时 POD 的状态仍然是 Running,TDengine 集群中的 dnode 状态要等 POD 状态为 `ready` 之后才能看到:
此时 Pod 的状态仍然是 Running,TDengine 集群中的 dnode 状态要等 Pod 状态为 `ready` 之后才能看到:
```bash
kubectl exec -i -t tdengine-3 -- taos -s "show dnodes"
```Bash
kubectl exec -it tdengine-3 -n tdengine-test -- taos -s "show dnodes"
```
扩容后的四节点 TDengine 集群的 dnode 列表:
```
```Plain
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:14:57.285 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:11.302 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:23.290 | |
4 | tdengine-3.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:33:16.039 | |
Query OK, 4 rows in database (0.008377s)
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
4 | tdengine-3.ta... | 0 | 16 | ready | 2023-07-20 16:01:44.007 | 2023-07-20 16:01:44.889 | | | |
Query OK, 4 row(s) in set (0.003628s)
```
## 集群缩容
由于 TDengine 集群在扩缩容时会对数据进行节点间迁移,使用 kubectl 命令进行缩容需要首先使用 "drop dnodes" 命令节点删除完成后再进行 Kubernetes 集群缩容。
由于 TDengine 集群在扩缩容时会对数据进行节点间迁移,使用 kubectl 命令进行缩容需要首先使用 "drop dnodes" 命令(**如果集群中存在3副本的db,那么缩容后的** **dnode** **个数也要必须大于等于3,否则drop dnode操作会被中止**),然后再节点删除完成后再进行 Kubernetes 集群缩容。
注意:由于 Kubernetes Statefulset 中 Pod 的只能按创建顺序逆序移除,所以 TDengine drop dnode 也需要按照创建顺序逆序移除,否则会导致 Pod 处于错误状态。
```
$ kubectl exec -i -t tdengine-0 -- taos -s "drop dnode 4"
```
```bash
$ kubectl exec -it tdengine-0 -- taos -s "show dnodes"
```Bash
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "drop dnode 4"
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:14:57.285 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:11.302 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | ready | 2022-08-10 13:15:23.290 | |
Query OK, 3 rows in database (0.004861s)
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
Query OK, 3 row(s) in set (0.003324s)
```
确认移除成功后(使用 kubectl exec -i -t tdengine-0 -- taos -s "show dnodes" 查看和确认 dnode 列表),使用 kubectl 命令移除 POD:
```
kubectl scale statefulsets tdengine --replicas=3
```Plain
kubectl scale statefulsets tdengine --replicas=3 -n tdengine-test
```
最后一个 POD 将会被删除。使用命令 kubectl get pods -l app=tdengine 查看POD状态:
```
$ kubectl get pods -l app=tdengine
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 4m7s
tdengine-1 1/1 Running 0 3m55s
tdengine-2 1/1 Running 0 2m28s
```Plain
kubectl get pod -l app=tdengine -n tdengine-test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 1/1 Running 4 (6h55m ago) 7h22m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (7h9m ago) 7h23m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 5h45m 10.244.1.224 node85 <none> <none>
```
POD删除后,需要手动删除PVC,否则下次扩容时会继续使用以前的数据导致无法正常加入集群。
```bash
$ kubectl delete pvc taosdata-tdengine-3
```Bash
kubectl delete pvc aosdata-tdengine-3 -n tdengine-test
```
此时的集群状态是安全的,需要时还可以再次进行扩容:
```bash
$ kubectl scale statefulsets tdengine --replicas=4
```Bash
kubectl scale statefulsets tdengine --replicas=4 -n tdengine-test
statefulset.apps/tdengine scaled
it@k8s-2:~/TDengine-Operator/src/tdengine$ kubectl get pods -l app=tdengine
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 35m
tdengine-1 1/1 Running 0 34m
tdengine-2 1/1 Running 0 12m
tdengine-3 0/1 ContainerCreating 0 4s
it@k8s-2:~/TDengine-Operator/src/tdengine$ kubectl get pods -l app=tdengine
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 35m
tdengine-1 1/1 Running 0 34m
tdengine-2 1/1 Running 0 12m
tdengine-3 0/1 Running 0 7s
it@k8s-2:~/TDengine-Operator/src/tdengine$ kubectl exec -it tdengine-0 -- taos -s "show dnodes"
kubectl get pod -l app=tdengine -n tdengine-test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tdengine-0 1/1 Running 4 (6h59m ago) 7h27m 10.244.2.75 node86 <none> <none>
tdengine-1 1/1 Running 1 (7h13m ago) 7h27m 10.244.0.59 node84 <none> <none>
tdengine-2 1/1 Running 0 5h49m 10.244.1.224 node85 <none> <none>
tdengine-3 1/1 Running 0 20s 10.244.2.77 node86 <none> <none>
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 17:38:49.012 | |
2 | tdengine-1.taosd.default.sv... | 1 | 4 | ready | 2022-07-25 17:39:01.517 | |
5 | tdengine-2.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 18:01:36.479 | |
6 | tdengine-3.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 18:13:54.411 | |
Query OK, 4 row(s) in set (0.001348s)
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
5 | tdengine-3.ta... | 0 | 16 | ready | 2023-07-20 16:31:34.092 | 2023-07-20 16:38:17.419 | | | |
Query OK, 4 row(s) in set (0.003881s)
```
## 清理 TDengine 集群
完整移除 TDengine 集群,需要分别清理 statefulset、svc、configmap、pvc。
> **删除pvc时需要注意下pv persistentVolumeReclaimPolicy策略,建议改为Delete,这样在删除pvc时才会自动清理pv,同时会清理底层的csi存储资源,如果没有配置删除pvc自动清理pv的策略,再删除pvc后,在手动清理pv时,pv对应的csi存储资源可能不会被释放。**
```bash
kubectl delete statefulset -l app=tdengine
kubectl delete svc -l app=tdengine
kubectl delete pvc -l app=tdengine
kubectl delete configmap taoscfg
完整移除 TDengine 集群,需要分别清理 statefulset、svc、configmap、pvc。
```Bash
kubectl delete statefulset -l app=tdengine -n tdengine-test
kubectl delete svc -l app=tdengine -n tdengine-test
kubectl delete pvc -l app=tdengine -n tdengine-test
kubectl delete configmap taoscfg -n tdengine-test
```
## 常见错误
......@@ -330,65 +519,26 @@ kubectl delete configmap taoscfg
未进行 "drop dnode" 直接进行缩容,由于 TDengine 尚未删除节点,缩容 pod 导致 TDengine 集群中部分节点处于 offline 状态。
```
$ kubectl exec -it tdengine-0 -- taos -s "show dnodes"
```Plain
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 17:38:49.012 | |
2 | tdengine-1.taosd.default.sv... | 1 | 4 | ready | 2022-07-25 17:39:01.517 | |
5 | tdengine-2.taosd.default.sv... | 0 | 4 | offline | 2022-07-25 18:01:36.479 | status msg timeout |
6 | tdengine-3.taosd.default.sv... | 0 | 4 | offline | 2022-07-25 18:13:54.411 | status msg timeout |
Query OK, 4 row(s) in set (0.001323s)
```
### 错误二
TDengine 集群会持有 replica 参数,如果缩容后的节点数小于这个值,集群将无法使用:
创建一个库使用 replica 参数为 2,插入部分数据:
```bash
kubectl exec -i -t tdengine-0 -- \
taos -s \
"create database if not exists test replica 2;
use test;
create table if not exists t1(ts timestamp, n int);
insert into t1 values(now, 1)(now+1s, 2);"
```
缩容到单节点:
```bash
kubectl scale statefulsets tdengine --replicas=1
```
在 TDengine CLI 中的所有数据库操作将无法成功。
id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note | active_code | c_active_code |
=============================================================================================================================================================================================================================================
1 | tdengine-0.ta... | 10 | 16 | ready | 2023-07-19 17:54:18.552 | 2023-07-20 09:39:04.297 | | | |
2 | tdengine-1.ta... | 10 | 16 | ready | 2023-07-19 17:54:37.828 | 2023-07-20 09:28:24.240 | | | |
3 | tdengine-2.ta... | 10 | 16 | ready | 2023-07-19 17:55:01.141 | 2023-07-20 10:48:43.445 | | | |
5 | tdengine-3.ta... | 0 | 16 | offline | 2023-07-20 16:31:34.092 | 2023-07-20 16:38:17.419 | status msg timeout | | |
Query OK, 4 row(s) in set (0.003862s)
```
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | |
2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout |
Query OK, 2 row(s) in set (0.000845s)
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | |
2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout |
Query OK, 2 row(s) in set (0.000837s)
## 最后
taos> use test;
Database changed.
对于在 Kubernetes 环境下 TDengine 的高可用和高可靠来说,对于硬件损坏、灾难恢复,分为两个层面来讲:
taos> insert into t1 values(now, 3);
1. 底层的分布式块存储具备的灾难恢复能力,块存储的多副本,当下流行的分布式块存储如 Ceph,就具备多副本能力,将存储副本扩展到不同的机架、机柜、机房、数据中心(或者直接使用公有云厂商提供的块存储服务)
2. TDengine的灾难恢复,在 TDengine Enterprise 中,本身具备了当一个 dnode 永久下线(物理机磁盘损坏,数据分拣丢失)后,重新拉起一个空白的dnode来恢复原dnode的工作。
DB error: Unable to resolve FQDN (0.013874s)
最后,欢迎使用[TDengine Cloud](https://cloud.taosdata.com/),来体验一站式全托管的TDengine云服务。
```
> TDengine Cloud 是一个极简的全托管时序数据处理云服务平台,它是基于开源的时序数据库 TDengine 而开发的。除高性能的时序数据库之外,它还具有缓存、订阅和流计算等系统功能,而且提供了便利而又安全的数据分享、以及众多的企业级功能。它可以让物联网、工业互联网、金融、IT 运维监控等领域企业在时序数据的管理上大幅降低人力成本和运营成本。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册