# Working with the bundled Consul service > 原文:[https://docs.gitlab.com/ee/administration/high_availability/consul.html](https://docs.gitlab.com/ee/administration/high_availability/consul.html) * [Prerequisites](#prerequisites) * [Configuring the Consul nodes](#configuring-the-consul-nodes) * [Consul checkpoint](#consul-checkpoint) * [Operations](#operations) * [Checking cluster membership](#checking-cluster-membership) * [Restarting the server cluster](#restarting-the-server-cluster) * [Upgrades for bundled Consul](#upgrades-for-bundled-consul) * [Troubleshooting](#troubleshooting) * [Consul server agents unable to communicate](#consul-server-agents-unable-to-communicate) * [Consul agents do not start - Multiple private IPs](#consul-agents-do-not-start---multiple-private-ips) * [Outage recovery](#outage-recovery) * [Recreate from scratch](#recreate-from-scratch) * [Recover a failed cluster](#recover-a-failed-cluster) # Working with the bundled Consul service[](#working-with-the-bundled-consul-service-premium-only "Permalink") 作为其高可用性堆栈的一部分,GitLab Premium 包含可通过`/etc/gitlab/gitlab.rb`管理的捆绑版本的[Consul](https://www.consul.io/) . Consul 是服务网络解决方案. 对于[GitLab Architecture](../../development/architecture.html) ,支持 Consul 利用来配置: 1. [Monitoring in Scaled and Highly Available environments](monitoring_node.html) 2. [PostgreSQL High Availability with Omnibus](../postgresql/replication_and_failover.html) Consul 群集由多个服务器代理以及在其他需要与 Consul 群集通信的节点上运行的客户端代理组成. ## Prerequisites[](#prerequisites "Permalink") 首先,请确保**在每个节点上** [下载/安装](https://about.gitlab.com/install/) Omnibus GitLab. 选择一种安装方法,然后确保完成以下步骤: 1. 安装和配置必要的依赖项. 2. 添加 GitLab 软件包存储库并安装软件包. 安装 GitLab 软件包时,请勿提供`EXTERNAL_URL`值. ## Configuring the Consul nodes[](#configuring-the-consul-nodes "Permalink") 在每个 Consul 节点上执行以下操作: 1. 在执行下一步之前,请确保为[`CONSUL_SERVER_NODES`](../postgresql/replication_and_failover.html#consul-information)收集[`CONSUL_SERVER_NODES`](../postgresql/replication_and_failover.html#consul-information) ,它们是 Consul 服务器节点的 IP 地址或 DNS 记录. 2. 编辑`/etc/gitlab/gitlab.rb`替换`/etc/gitlab/gitlab.rb` `# START user configuration`部分中记录的值: ``` # Disable all components except Consul roles ['consul_role'] # START user configuration # Replace placeholders: # # Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z # with the addresses gathered for CONSUL_SERVER_NODES consul['configuration'] = { server: true, retry_join: %w(Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z) } # Disable auto migrations gitlab_rails['auto_migrate'] = false # # END user configuration ``` > `consul_role`在 GitLab 10.3 中引入 3. [重新配置 GitLab,](../restart_gitlab.html#omnibus-gitlab-reconfigure)以使更改生效. ### Consul checkpoint[](#consul-checkpoint "Permalink") 在继续之前,请确保 Consul 配置正确. 运行以下命令以验证所有服务器节点都在通信: ``` /opt/gitlab/embedded/bin/consul members ``` 输出应类似于: ``` Node Address Status Type Build Protocol DC CONSUL_NODE_ONE XXX.XXX.XXX.YYY:8301 alive server 0.9.2 2 gitlab_consul CONSUL_NODE_TWO XXX.XXX.XXX.YYY:8301 alive server 0.9.2 2 gitlab_consul CONSUL_NODE_THREE XXX.XXX.XXX.YYY:8301 alive server 0.9.2 2 gitlab_consul ``` 如果任何一个节点都不`alive`或三个节点中的任何一个都不`alive` ,请在继续操作之前检查" [疑难解答"部分](#troubleshooting) . ## Operations[](#operations "Permalink") ### Checking cluster membership[](#checking-cluster-membership "Permalink") 要查看哪些节点是集群的一部分,请在集群中的任何成员上运行以下命令 ``` $ /opt/gitlab/embedded/bin/consul members Node Address Status Type Build Protocol DC consul-b XX.XX.X.Y:8301 alive server 0.9.0 2 gitlab_consul consul-c XX.XX.X.Y:8301 alive server 0.9.0 2 gitlab_consul consul-c XX.XX.X.Y:8301 alive server 0.9.0 2 gitlab_consul db-a XX.XX.X.Y:8301 alive client 0.9.0 2 gitlab_consul db-b XX.XX.X.Y:8301 alive client 0.9.0 2 gitlab_consul ``` Ideally all nodes will have a `Status` of `alive`. ### Restarting the server cluster[](#restarting-the-server-cluster "Permalink") **注意:**本部分仅适用于服务器代理. 在需要时重新启动客户端代理是安全的. 如果有必要重新启动服务器群集,则以受控方式进行此操作很重要,以保持仲裁. 如果仲裁丢失,则需要遵循 Consul [中断恢复](#outage-recovery)过程来恢复群集. 为了安全起见,建议您一次仅重新启动一个服务器代理,以确保群集保持完整. 对于较大的群集,可以一次重新启动多个代理. 请参阅[领事共识文档,](https://www.consul.io/docs/internals/consensus.html#deployment-table)以了解可以容忍多少次失败. 这是它可以承受的同时重启的次数. ## Upgrades for bundled Consul[](#upgrades-for-bundled-consul "Permalink") 运行 GitLab 捆绑的 Consul 的节点应为: * 升级 Omnibus GitLab 软件包之前,健康集群的成员. * 一次升级一个节点. **注意:**从任何 Consul 节点运行`curl http://127.0.0.1:8500/v1/health/state/critical` ,将识别群集中的现有运行状况问题. 如果群集运行状况良好,该命令将返回一个空数组. 领事群集使用筏协议进行通信. 如果当前领导者下线,则需要进行领导者选举. 必须存在一个领导者节点,以促进整个集群之间的同步. 如果同时有太多节点脱机,则由于[共识破裂](https://www.consul.io/docs/internals/consensus.html) ,群集将失去仲裁且不会选举领导者. 如果群集在升级后无法恢复,请参考[故障排除部分](#troubleshooting) . [中断恢复](#outage-recovery)可能特别有意义. **注意:** GitLab 仅使用 Consul 来存储易于重新生成的瞬态数据. 如果捆绑包的 Consul 除了 GitLab 本身以外没有被其他进程使用,那么[从头开始重建集群](#recreate-from-scratch)就可以了. ## Troubleshooting[](#troubleshooting "Permalink") ### Consul server agents unable to communicate[](#consul-server-agents-unable-to-communicate "Permalink") 默认情况下,服务器代理将尝试[绑定](https://www.consul.io/docs/agent/options.html#_bind)到" 0.0.0.0",但是它们将通告节点上的第一个私有 IP 地址,以供其他代理与其通信. 如果其他节点无法与此地址上的节点通信,则群集将处于故障状态. 如果遇到此问题,您将在`gitlab-ctl tail consul`输出中看到类似以下的消息: ``` 2017-09-25_19:53:39.90821 2017/09/25 19:53:39 [WARN] raft: no known peers, aborting election 2017-09-25_19:53:41.74356 2017/09/25 19:53:41 [ERR] agent: failed to sync remote state: No cluster leader ``` 要解决此问题: 1. 在每个其他节点都可以到达的节点上选择一个地址. 2. 更新您的`/etc/gitlab/gitlab.rb` ``` consul['configuration'] = { ... bind_addr: 'IP ADDRESS' } ``` 3. Run `gitlab-ctl reconfigure` 如果仍然看到错误,则可能必须[删除 Consul 数据库并](#recreate-from-scratch)在受影响的节点上[重新初始化](#recreate-from-scratch) . ### Consul agents do not start - Multiple private IPs[](#consul-agents-do-not-start---multiple-private-ips "Permalink") 如果一个节点具有多个专用 IP,则代理会混淆要播发哪个专用地址,然后在启动时立即退出. 如果遇到此问题,您将在`gitlab-ctl tail consul`输出中看到类似以下的消息: ``` 2017-11-09_17:41:45.52876 ==> Starting Consul agent... 2017-11-09_17:41:45.53057 ==> Error creating agent: Failed to get advertise address: Multiple private IPs found. Please configure one. ``` 要解决此问题: 1. 在该节点上选择一个地址,所有其他节点都可以通过该地址到达该节点. 2. 更新您的`/etc/gitlab/gitlab.rb` ``` consul['configuration'] = { ... bind_addr: 'IP ADDRESS' } ``` 3. Run `gitlab-ctl reconfigure` ### Outage recovery[](#outage-recovery "Permalink") 如果您在群集中丢失了足够的服务器代理以破坏仲裁,则该群集将被视为失败,并且如果没有手动干预,它将无法运行. #### Recreate from scratch[](#recreate-from-scratch "Permalink") 默认情况下,GitLab 不会在 Consul 群集中存储任何无法重新创建的内容. 删除 Consul 数据库并重新初始化 ``` gitlab-ctl stop consul rm -rf /var/opt/gitlab/consul/data gitlab-ctl start consul ``` 此后,群集应开始备份,并且服务器代理重新加入. 之后不久,客户代理也应重新加入. #### Recover a failed cluster[](#recover-a-failed-cluster "Permalink") 如果您已利用 Consul 来存储其他数据,并想还原发生故障的群集,请按照[Consul 指南](https://learn.hashicorp.com/consul/day-2-operations/outage)来恢复发生故障的群集.