规划集群所有物理节点的 FQDN,将规划好的 FQDN 分别添加到每个物理节点的 /etc/hosts;修改每个物理节点的 /etc/hosts,将所有集群物理节点的 IP 与 FQDN 的对应添加好。【如部署了 DNS,请联系网络管理员在 DNS 上做好相关配置】
The FQDN of all hosts need to be setup properly, all the FQDNs need to be configured in the /etc/hosts of each host. It must be guaranteed that each FQDN can be accessed (by ping, for example) from any other hosts.
On each host command `hostname -f` can be executed to get the hostname. `ping` command can be executed on each host to check whether any other host is accessible from it. If any host is not accessible, the network configuration, like /etc/hosts or DNS configuration, need to be checked and revised to make any two hosts accessible to each other.
:::note
客户端所在服务器也需要配置,确保它可以正确解析每个节点的 FQDN 配置,不管是通过 DNS 服务,还是修改 hosts 文件。
- The host where the client program runs also needs to configured properly for FQDN, to make sure all hosts for client or server can be accessed from any other. In other words, the hosts where the client is running are also considered as a part of the cluster.
If any previous version of TDengine has been installed and configured on any host, the installation needs to be removed and the data needs to be cleaned up. For details about uninstalling please refer to [Install and Uninstall](/operation/pkg-install). To clean up the data, please use `rm -rf /var/lib/taos/\*` assuming the `dataDir` is configured as `/var/lib/taos`.
每个数据节点的 End Point 就是输出的 hostname 外加端口号,比如 h1.taosdata.com:6030。
Now it's time to install TDengine on all hosts without starting `taosd`, the versions on all hosts should be same. If it's prompted to input the existing TDengine cluster, simply press carriage return to ignore it. `install.sh -e no` can also be used to disable this prompt. For details please refer to [Install and Uninstall](/operation/pkg-install).
### 第五步
### Step 4
修改 TDengine 的配置文件(所有节点的文件 /etc/taos/taos.cfg 都需要修改)。假设准备启动的第一个数据节点 End Point 为 h1.taosdata.com:6030,其与集群配置相关参数如下:
Now each physical node (referred to as `dnode` hereinafter, it's abbreviation for "data node") of TDengine need to be configured properly. Please be noted that one dnode doesn't stand for one host, multiple TDengine nodes can be started on single host as long as they are configured properly without conflicting. More specifically each instance of the configuration file `taos.cfg` stands for a dnode. Assuming the first dnode of TDengine cluster is "h1.taosdata.com:6030", its `taos.cfg` is configured as following.
`firstEp` and `fqdn` must be configured properly. In `taos.cfg` of all dnodes in TDengine cluster, `firstEp` must be configured to point to same address, i.e. the first dnode of the cluster. `fqdn` and `serverPort` compose the address of each node itself. If you want to start multiple TDengine dnodes on a single host, please also make sure all other configurations like `dataDir`, `logDir`, and other resources related parameters are not conflicting.
For all the dnodes in a TDengine cluster, below parameters must be configured as exactly same, any node whose configuration is different from dnodes already in the cluster can't join the cluster.
The first dnode can be started following the instructions in [Get Started](/get-started/), for example h1.taosdata.com. Then TDengine CLI `taos` can be launched to execute command `show dnodes`, the output is as following for example:
```
Welcome to the TDengine shell from Linux, Client Version:2.0.0.0
...
...
@@ -100,39 +86,29 @@ Query OK, 1 row(s) in set (0.006385s)
taos>
```
上述命令里,可以看到刚启动的数据节点的 End Point 是:h1.taos.com:6030,就是这个新集群的 firstEp。
From the above output, it is shown that the end point of the started dnode is "h1.taos.com:6030", which is the `firstEp` of the cluster.
### 启动后续数据节点
### Start Other DNODEs
将后续的数据节点添加到现有集群,具体有以下几步:
There are a few steps necessary to add other dnodes in the cluster.
按照《立即开始》一章的方法在每个物理节点启动 taosd;(注意:每个物理节点都需要在 taos.cfg 文件中将 firstEp 参数配置为新集群首个节点的 End Point——在本例中是 h1.taos.com:6030)
Firstly, start `taosd` as instructed in [Get Started](/get-started/), assuming it's for the second dnode. Before starting `taosd`, please making sure the configuration is correct, especially `firstEp`, `FQDN` and `serverPort`, `firstEp` must be same as the dnode shown in the section "Start First DNODE", i.e. "h1.taosdata.com" in this example.
在第一个数据节点,使用 CLI 程序 taos,登录进 TDengine 系统,执行命令:
Then, on the first dnode, use TDengine CLI `taos` to execute below command to add the end point of the dnode in the cluster. In the command "fqdn:port" should be quoted using double quotes.
```sql
CREATEDNODE"h2.taos.com:6030";
```
将新数据节点的 End Point(准备工作中第四步获知的)添加进集群的 EP 列表。“fqdn:port”需要用双引号引起来,否则出错。请注意将示例的“h2.taos.com:6030” 替换为这个新数据节点的 End Point。
然后执行命令
Then on the first dnode, execute `show dnodes` in `taos` to show whether the second dnode has been added in the cluster successfully or not.
```sql
SHOWDNODES;
```
查看新节点是否被成功加入。如果该被加入的数据节点处于离线状态,请做两个检查:
If the status of the newly added dnode is offlie, please check:
查看该数据节点的 taosd 是否正常工作,如果没有正常运行,需要先检查为什么?
查看该数据节点 taosd 日志文件 taosdlog.0 里前面几行日志(一般在 /var/log/taos 目录),看日志里输出的该数据节点 fqdn 以及端口号是否为刚添加的 End Point。如果不一致,需要将正确的 End Point 添加进去。
按照上述步骤可以源源不断的将新的数据节点加入到集群。
- Whether the `taosd` process is running properly or not
- In the log file `taosdlog.0` to see whether the fqdn and port are correct or not 查
:::tip
任何已经加入集群在线的数据节点,都可以作为后续待加入节点的 firstEp。
firstEp 这个参数仅仅在该数据节点首次加入集群时有作用,加入集群后,该数据节点会保存最新的 mnode 的 End Point 列表,不再依赖这个参数。
It has been introduced that how to deploy and start a cluster from scratch. Once a cluster is ready, the dnode status in the cluster can be shown at any time, new dnode can be added to scale out the cluster, an existing dnode can be removed, even load balance can be performed manually.\
:::note
以下所有执行命令的操作需要先登陆进 TDengine 系统,必要时请使用 root 权限。
All the commands to be introduced in this chapter need to be run after login TDengine, sometimes it's necessary to use root privilege.
:::
## 查看数据节点
## Show DNODEs
启动 TDengine CLI 程序 taos,然后执行:
below command can be executed in TDengine CLI `taos` to list all dnodes in the cluster, including ID, end point (fqdn:port), status (ready, offline), number of vnodes, number of free vnodes, etc. It's suggested to execute this command to check after adding or removing a dnode.
To utilize system resources efficiently and provide scalability, data sharding is required. The data of each database is divided into multiple shards and stored in multiple vnodes. These vnodes may be located in different dnodes, scaling out can be achieved by adding more vnodes from more dnodes. Each vnode can only be used for a single DB, but one DB can have multiple vnodes. The allocation of vnode is scheduled automatically by mnode according to system resources of the dnodes.
启动 CLI 程序 taos,然后执行:
Launch TDengine CLI `taos` and execute below command:
```sql
USESOME_DATABASE;
SHOWVGROUPS;
```
输出如下(具体内容仅供参考,取决于实际的集群配置)
The example output is as below:
```
taos> show dnodes;
...
...
@@ -67,17 +65,16 @@ taos> show vgroups;
Query OK, 8 row(s) in set (0.001154s)
```
## 添加数据节点
## Add DNODE
启动 CLI 程序 taos,然后执行:
Launch TDengine CLI `taos` and execute to add the end point of a new dnode into the EPI (end point) list of the cluster. "fqdn:port" must be quoted using double quotes.
```sql
CREATEDNODE"fqdn:port";
```
将新数据节点的 End Point 添加进集群的 EP 列表。“fqdn:port“需要用双引号引起来,否则出错。一个数据节点对外服务的 fqdn 和 port 可以通过配置文件 taos.cfg 进行配置,缺省是自动获取。【强烈不建议用自动获取方式来配置 FQDN,可能导致生成的数据节点的 End Point 不是所期望的】
It can be seen that the status of the new dnode is "offline", once the dnode is started and connects the firstEp of the cluster, execute the command again and get below example output, from which it can be seen that two dnodes are both in "ready" status.
Launch TDengine CLI `taos` and execute command below to drop or remove a dndoe from the cluster. In the command, `dnodeId` can be gotten from `show dnodes`.
In the above example, when `show dnodes` is executed the first time, two dnodes are shown. Then `drop dnode 2` is executed, after that from the output of executing `show dnodes` again it can be seen that only the dnode with ID 1 is still in the cluster.
:::warning
:::note
数据节点一旦被 drop 之后,不能重新加入集群。需要将此节点重新部署(清空数据文件夹)。集群在完成 `drop dnode` 操作之前,会将该 dnode 的数据迁移走。
一个数据节点被 drop 之后,其他节点都会感知到这个 dnodeID 的删除操作,任何集群中的节点都不会再接收此 dnodeID 的请求。
dnodeID 是集群自动分配的,不得人工指定。它在生成时是递增的,不会重复。
- Once a dnode is dropped, it can't rejoin the cluster. To rejoin, the dnode needs to deployed again after cleaning up the data directory. Normally, before dropping a dnode, the data belonging to the dnode needs to be migrated to other place.
- Please be noted that `drop dnode` is different from stopping `taosd` process. `drop dnode` just removes the dnode out of TDengine cluster. Only after a dnode is dropped, can the corresponding `taosd` process be stopped.
- Once a dnode is dropped, other dnodes in the cluster will be notified of the drop and will not accept the request from the dropped dnode.
- dnodeID is allocated automatically and can't be interfered manually. dnodeID is generated in ascending order without duplication.
:::
## 手动迁移数据节点
## Move VNODE
手动将某个 vnode 迁移到指定的 dnode。
A vnode can be manually moved from one dnode to another.
启动 CLI 程序 taos,然后执行:
Launch TDengine CLI `taos` and execute below command:
In the above command, `source-dnodeId` is the original dnodeId where the vnode resides, `dest-dnodeId` specifies the target dnode. vgId (vgroup ID) can be shown by `SHOW VGROUPS `.
Firstly `show vgroups` is executed to show the vgrup distribution.
It can be seen that there are 5 vgroups in dnode 3 and 3 vgroups in node 1, now we want to move vgId 18 from dnode 3 to dnode 1. Execute below command in `taos`
```
taos> alter dnode 3 balance "vnode:18-dnode:1";
...
...
@@ -183,9 +180,10 @@ taos> alter dnode 3 balance "vnode:18-dnode:1";
DB error: Balance already enabled (0.00755
```
上面的结果表明目前所在数据库已经启动了 balance 选项,所以无法进行手动迁移。
However, the operation fails with error message show above, which means automatic load balancing has been enabled in the current database so manual load balance can't be performed.
Shutdown the cluster, configure `balance` parameter in all the dnodes to 0, then restart the cluster, and execute `alter dnode` and `show vgroups` as below.
High availability of vnode and mnode can be achieved through replicas in TDengine.
vnode 的副本数是与 DB 关联的,一个集群里可以有多个 DB,根据运营的需求,每个 DB 可以配置不同的副本数。创建数据库时,通过参数 replica 指定副本数(缺省为 1)。如果副本数为 1,系统的可靠性无法保证,只要数据所在的节点宕机,就将无法提供服务。集群的节点数必须大于等于副本数,否则创建表时将返回错误“more dnodes are needed”。比如下面的命令将创建副本数为 3 的数据库 demo:
The number of vnodes is associated with each DB, there can be multiple DBs in a TDengine cluster. For the purpose of operation, different number of replicas can be configured properly for each DB. When creating a database, the parameter `replica` is used to specify the number of replicas, the default value is 1. With single replica, the reliability of the system can't be guaranteed. Whenever one node is down, data service would be unavailable. The number of dnodes in the cluster must NOT be lower than the number of replicas set for any DB, otherwise the `create table` operation would fail with error "more dnodes are needed". Below SQL statement is used to create a database named as "demo" with 3 replicas.
```sql
CREATEDATABASEdemoreplica3;
```
一个 DB 里的数据会被切片分到多个 vnode group,vnode group 里的 vnode 数目就是 DB 的副本数,同一个 vnode group 里各 vnode 的数据是完全一致的。为保证高可用性,vnode group 里的 vnode 一定要分布在不同的数据节点 dnode 里(实际部署时,需要在不同的物理机上),只要一个 vnode group 里超过半数的 vnode 处于工作状态,这个 vnode group 就能正常的对外服务。
The data in a DB is divided into multiple shards and stored in multiple vgroups. The number of vnodes in each group is determined by the number of replicas set for the DB. The vnodes in each vgroups store exactly same data. For the purpose of high availability, the vnodes in a vgroup must be located in different dnodes on different hosts. As long as over half of the vnodes in a vgroup are in online state, the vgroup is able to serve data access. Otherwise the vgroup can't handle any data access for reading or inserting data.
一个数据节点 dnode 里可能有多个 DB 的数据,因此一个 dnode 离线时,可能会影响到多个 DB。如果一个 vnode group 里的一半或一半以上的 vnode 不工作,那么该 vnode group 就无法对外服务,无法插入或读取数据,这样会影响到它所属的 DB 的一部分表的读写操作。
There may be data for multiple DBs in a dnode. Once a dnode is down, multiple DBs may be affected. However, it's hard to say the cluster is guaranteed to work properly as long as over half of dnodes are online because vnodes are introduced and there may be complex mapping between vnodes and dnodes.
Each TDengine cluster is managed by `mnode`, which is a module of `taosd`. For the high availability of mnode, multiple mnodes can be configured using system parameter `numOfMNodes`, the valid time range is [1,3]. To make sure the data consistency between mnodes, the data replication between mnodes is performed in synchronous way.
There may be multiple dnodes in a cluster, but only one mnode can be started in each dnode. Which one or ones of the dnodes will be designated as mnodes is automatically determined by TDengine according to the cluster configuration and system resources. Command `show mnodes` can be executed in TDengine `taos` to show the mnodes in the cluster.
The end point and role/status (master, slave, unsynced, or offline) of all mnodes can be shown by the above command. When the first dnode is started in a cluster, there must be one mnode in this dnode, because there must be at least one mnode otherwise the cluster doesn't work. If `numOfMNodes` is configured to 2, another mnode will be started when the second dnode is launched.
For the high availability of mnode, `numOfMnodes` needs to be configured to 2 or a higher value. Because the data consistency between mnodes must be guaranteed, the replica confirmation parameter `quorum` is set to 2 automatically if `numOfMNodes` is set to 2 or higher.
:::note
一个 TDengine 高可用系统,无论是 vnode 还是 mnode,都必须配置多个副本。
If high availability is important for your system, both vnode and mnode must be configured to have multiple replicas. How to configure for them are different and have been described.
- When a new dnode is joined in the cluster, automatic load balancing may be triggered, some data from some dnodes may be transferred to the new dnode automatically.当
- When a dnode is removed from the cluster, the data from this dnode will be transferred to other dnodes automatically.
- When a dnode is too hot, i.e. too much data has been stored in it, automatic load balancing may be triggered to migrate some vnodes from this dnode to other dnodes.
- :::tip
Automatic load balancing is controlled by parameter `balance`, 0 means disabled and 1 means enabled.
:::
## 数据节点离线处理
## Dnode Offline
如果一个数据节点离线,TDengine 集群将自动检测到。有如下两种情况:
When a dnode is offline, it can be detected by the TDengine cluster. There are two cases:
- The dnode becomes online again before the threshold configured in `offlineThreshold` is reached, it is still in the cluster and data replication is started automatically. The dnode can work properly after the data syncup is finished.
- If the dnode has been offline over the threshold configured in `offlineThreshold` in `taos.cfg`, the dnode will be removed from the cluster automatically. System alert will be generated and automatic load balancing will be triggered too if `balance` is set to 1. When the removed dnode is restarted and becomes online, it will not be joined in the cluster automatically, it can only be joined manually by the system operator.
If all the vnodes in a vgroup (or mnodes in mnode group) are in offline or unsynced status, the master node can only be voted after all the vnodes or mnodes in the group become online and can exchange status, then the vgroup (or mnode group) is able to provide service.
为解决这个问题,TDengine 引入了 Arbitrator 的概念。Arbitrator 模拟一个 vnode 或 mnode 在工作,但只简单的负责网络连接,不处理任何数据插入或访问。只要包含 Arbitrator 在内,超过半数的 vnode 或 mnode 工作,那么该 vnode group 或 mnode 组就可以正常的提供数据插入或查询服务。比如对于副本数为 2 的情形,如果一个节点 A 离线,但另外一个节点 B 正常,而且能连接到 Arbitrator,那么节点 B 就能正常工作。
If the number of replicas is set to an even number like 2, when half of the vnodes in a vgroup don't work master node can't be voted. Similar case is also applicable to mnode if the number of mnodes is set to an even number like 2.
To resolve this problem, a new arbitrator component named `tarbitrator`, abbreviated for TDengine Arbitrator, was introduced. Arbitrator simulates a vnode or mnode but it's only responsible for network communication and doesn't handle any actual data access. With Arbitrator, any vgroup or mnode group can be considered as having number of member nodes and master node can be selected.
Arbitrator 的执行程序名为 tarbitrator。该程序对系统资源几乎没有要求,只需要保证有网络连接,找任何一台 Linux 服务器运行它即可。以下简要描述安装配置的步骤:
Normally, it's suggested to configure replica number of each DB or system parameter `numOfMNodes` to an odd number. However, if a user is very sensitive to storage space, replica number of 2 plus arbitrator component can be used to achieve both lower cost of storage space and high availability.
请点击 安装包下载,在 TDengine Arbitrator Linux 一节中,选择合适的版本下载并安装。
该应用的命令行参数 -p 可以指定其对外服务的端口号,缺省是 6042。
Arbitrator component is installed with the server package. For details about how to install, please refer to [Install](/operation/pkg-install). The `-p` parameter of `tarbitrator` can be used to specify the port on which it provides service.
In the configuration file `taos.cfg` of each dnode, parameter `arbitrator` needs to be configured to the end point of the `tarbitrator` process. arbitrator component will be used automatically if the replica is configured to an even number and will be ignored if the replica is configured to an odd number.
在配置文件中配置了的 Arbitrator,会出现在 SHOW DNODES 指令的返回结果中,对应的 role 列的值会是“arb”。
查看集群 Arbitrator 的状态【2.0.14.0 以后支持】
Arbitrator can be shown by executing command in TDengine CLI `taos` with its role shown as "arb".