--- sidebar_labe: Kubernetes title: 在 Kubernetes 上部署 TDengine 集群 --- ## 配置 ConfigMap 为 TDengine 创建 `taoscfg.yaml`,此文件中的配置将作为环境变量传入 TDengine 镜像,更新此配置将导致所有 TDengine POD 重启。 ```yaml --- apiVersion: v1 kind: ConfigMap metadata: name: taoscfg labels: app: tdengine data: CLUSTER: "1" TAOS_KEEP: "3650" TAOS_DEBUG_FLAG: "135" ``` ## 配置服务 创建一个 service 配置文件:`taosd-service.yaml`,服务名称 `metadata.name` (此处为 "taosd") 将在下一步中使用到。添加 TDengine 所用到的所有端口: ```yaml --- apiVersion: v1 kind: Service metadata: name: "taosd" labels: app: "tdengine" spec: ports: - name: tcp6030 protocol: "TCP" port: 6030 - name: tcp6035 protocol: "TCP" port: 6035 - name: tcp6041 protocol: "TCP" port: 6041 - name: udp6030 protocol: "UDP" port: 6030 - name: udp6031 protocol: "UDP" port: 6031 - name: udp6032 protocol: "UDP" port: 6032 - name: udp6033 protocol: "UDP" port: 6033 - name: udp6034 protocol: "UDP" port: 6034 - name: udp6035 protocol: "UDP" port: 6035 - name: udp6036 protocol: "UDP" port: 6036 - name: udp6037 protocol: "UDP" port: 6037 - name: udp6038 protocol: "UDP" port: 6038 - name: udp6039 protocol: "UDP" port: 6039 - name: udp6040 protocol: "UDP" port: 6040 selector: app: "tdengine" ``` ## 有状态服务 StatefulSet 根据 Kubernetes 对各类部署的说明,我们将使用 StatefulSet 作为 TDengine 的服务类型,创建文件 `tdengine.yaml`: ```yaml --- apiVersion: apps/v1 kind: StatefulSet metadata: name: "tdengine" labels: app: "tdengine" spec: serviceName: "taosd" replicas: 2 updateStrategy: type: RollingUpdate selector: matchLabels: app: "tdengine" template: metadata: name: "tdengine" labels: app: "tdengine" spec: containers: - name: "tdengine" image: "zitsen/taosd:develop" imagePullPolicy: "Always" envFrom: - configMapRef: name: taoscfg ports: - name: tcp6030 protocol: "TCP" containerPort: 6030 - name: tcp6035 protocol: "TCP" containerPort: 6035 - name: tcp6041 protocol: "TCP" containerPort: 6041 - name: udp6030 protocol: "UDP" containerPort: 6030 - name: udp6031 protocol: "UDP" containerPort: 6031 - name: udp6032 protocol: "UDP" containerPort: 6032 - name: udp6033 protocol: "UDP" containerPort: 6033 - name: udp6034 protocol: "UDP" containerPort: 6034 - name: udp6035 protocol: "UDP" containerPort: 6035 - name: udp6036 protocol: "UDP" containerPort: 6036 - name: udp6037 protocol: "UDP" containerPort: 6037 - name: udp6038 protocol: "UDP" containerPort: 6038 - name: udp6039 protocol: "UDP" containerPort: 6039 - name: udp6040 protocol: "UDP" containerPort: 6040 env: # POD_NAME for FQDN config - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name # SERVICE_NAME and NAMESPACE for fqdn resolve - name: SERVICE_NAME value: "taosd" - name: STS_NAME value: "tdengine" - name: STS_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace # TZ for timezone settings, we recommend to always set it. - name: TZ value: "Asia/Shanghai" # TAOS_ prefix will configured in taos.cfg, strip prefix and camelCase. - name: TAOS_SERVER_PORT value: "6030" # Must set if you want a cluster. - name: TAOS_FIRST_EP value: "$(STS_NAME)-0.$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local:$(TAOS_SERVER_PORT)" # TAOS_FQND should always be setted in k8s env. - name: TAOS_FQDN value: "$(POD_NAME).$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local" volumeMounts: - name: taosdata mountPath: /var/lib/taos readinessProbe: exec: command: - taos - -s - "show mnodes" initialDelaySeconds: 5 timeoutSeconds: 5000 livenessProbe: tcpSocket: port: 6030 initialDelaySeconds: 15 periodSeconds: 20 volumeClaimTemplates: - metadata: name: taosdata spec: accessModes: - "ReadWriteOnce" storageClassName: "csi-rbd-sc" resources: requests: storage: "10Gi" ``` ## 启动集群 将前述三个文件添加到 Kubernetes 集群中: ```bash kubectl apply -f taoscfg.yaml kubectl apply -f taosd-service.yaml kubectl apply -f tdengine.yaml ``` 上面的配置将生成一个两节点的 TDengine 集群,dnode 是自动配置的,可以使用 `show dnodes` 命令查看当前集群的节点: ```bash kubectl exec -i -t tdengine-0 -- taos -s "show dnodes" kubectl exec -i -t tdengine-1 -- taos -s "show dnodes" ``` 输出如下: ``` Welcome to the TDengine shell from Linux, Client Version:2.1.1.0 Copyright (c) 2020 by TAOS Data, Inc. All rights reserved. taos> show dnodes id | end_point | vnodes | cores | status | role | create_time | offline reason | ====================================================================================================================================== 1 | tdengine-0.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 17:13:24.181 | | 2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 17:14:09.257 | | Query OK, 2 row(s) in set (0.000997s) ``` ## 集群扩容 TDengine 集群支持自动扩容: ```bash kubectl scale statefulsets tdengine --replicas=4 ``` 上面命令行中参数 `--replica=4` 表示要将 TDengine 集群扩容到 4 个节点,执行后首先检查 POD 的状态: ```bash kubectl get pods -l app=tdengine ``` 输出如下: ``` NAME READY STATUS RESTARTS AGE tdengine-0 1/1 Running 0 161m tdengine-1 1/1 Running 0 161m tdengine-2 1/1 Running 0 32m tdengine-3 1/1 Running 0 32m ``` 此时 POD 的状态仍然是 Running,TDengine 集群中的 dnode 状态要等 POD 状态为 `ready` 之后才能看到: ```bash kubectl exec -i -t tdengine-0 -- taos -s "show dnodes" ``` 扩容后的四节点 TDengine 集群的 dnode 列表: ``` Welcome to the TDengine shell from Linux, Client Version:2.1.1.0 Copyright (c) 2020 by TAOS Data, Inc. All rights reserved. taos> show dnodes id | end_point | vnodes | cores | status | role | create_time | offline reason | ====================================================================================================================================== 1 | tdengine-0.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:12.915 | | 2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:33.127 | | 3 | tdengine-2.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 14:07:27.078 | | 4 | tdengine-3.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 14:07:48.362 | | Query OK, 4 row(s) in set (0.001293s) ``` ## 集群缩容 TDengine 的缩容并没有自动化,我们尝试将一个三节点集群缩容到两节点。 首先,确认一个三节点 TDengine 集群正常工作,在 TDengine CLI 中查看 dnode 的状态: ```bash taos> show dnodes id | end_point | vnodes | cores | status | role | create_time | offline reason | ====================================================================================================================================== 1 | tdengine-0.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 16:27:24.852 | | 2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:27:53.339 | | 3 | tdengine-2.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:28:49.787 | | Query OK, 3 row(s) in set (0.001101s) ``` 想要安全的缩容,首先需要将节点从 dnode 列表中移除,也即从集群中移除: ```bash kubectl exec -i -t tdengine-0 -- taos -s "drop dnode 'tdengine-2.taosd.default.svc.cluster.local:6030'" ``` 通过 `show dondes` 命令确认移除成功后,移除相应的 POD: ```bash kubectl scale statefulsets tdengine --replicas=2 ``` 最后一个 POD 会被删除,使用 `kubectl get pods -l app=tdengine` 查看集群状态: ``` NAME READY STATUS RESTARTS AGE tdengine-0 1/1 Running 0 3h40m tdengine-1 1/1 Running 0 3h40m ``` POD 删除后,需要手动删除 PVC,否则下次扩容时会继续使用以前的数据导致无法正常加入集群。 ```bash kubectl delete pvc taosdata-tdengine-2 ``` 此时的集群状态是安全的,需要时还可以再次进行扩容: ```bash kubectl scale statefulsets tdengine --replicas=3 ``` `show dnodes` 输出如下: ``` taos> show dnodes id | end_point | vnodes | cores | status | role | create_time | offline reason | ====================================================================================================================================== 1 | tdengine-0.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 16:27:24.852 | | 2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:27:53.339 | | 4 | tdengine-2.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:40:49.177 | | ``` ## 删除集群 完整移除 TDengine 集群,需要分别清理 statefulset、svc、configmap、pvc。 ```bash kubectl delete statefulset -l app=tdengine kubectl delete svc -l app=tdengine kubectl delete pvc -l app=tdengine kubectl delete configmap taoscfg ``` ## 常见错误 ### 错误一 扩容到四节点之后缩容到两节点,删除的 POD 会进入 offline 状态: ``` Welcome to the TDengine shell from Linux, Client Version:2.1.1.0 Copyright (c) 2020 by TAOS Data, Inc. All rights reserved. taos> show dnodes id | end_point | vnodes | cores | status | role | create_time | offline reason | ====================================================================================================================================== 1 | tdengine-0.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:12.915 | | 2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:33.127 | | 3 | tdengine-2.taosd.default.sv... | 0 | 40 | offline | any | 2021-06-01 14:07:27.078 | status msg timeout | 4 | tdengine-3.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 14:07:48.362 | status msg timeout | Query OK, 4 row(s) in set (0.001236s) ``` 但 `drop dnode` 的行为按不会按照预期进行,且下次集群重启后,所有的 dnode 节点将无法启动 dropping 状态无法退出。 ### 错误二 TDengine 集群会持有 replica 参数,如果缩容后的节点数小于这个值,集群将无法使用: 创建一个库使用 replica 参数为 2,插入部分数据: ```bash kubectl exec -i -t tdengine-0 -- \ taos -s \ "create database if not exists test replica 2; use test; create table if not exists t1(ts timestamp, n int); insert into t1 values(now, 1)(now+1s, 2);" ``` 缩容到单节点: ```bash kubectl scale statefulsets tdengine --replicas=1 ``` 在 taos shell 中的所有数据库操作将无法成功。 ``` taos> show dnodes; id | end_point | vnodes | cores | status | role | create_time | offline reason | ====================================================================================================================================== 1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | | 2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout | Query OK, 2 row(s) in set (0.000845s) taos> show dnodes; id | end_point | vnodes | cores | status | role | create_time | offline reason | ====================================================================================================================================== 1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | | 2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout | Query OK, 2 row(s) in set (0.000837s) taos> use test; Database changed. taos> insert into t1 values(now, 3); DB error: Unable to resolve FQDN (0.013874s) ```