505.md 18.0 KB
Newer Older
Lab机器人's avatar
Lab机器人 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339
# Disaster Recovery (Geo)

> 原文:[https://docs.gitlab.com/ee/administration/geo/disaster_recovery/](https://docs.gitlab.com/ee/administration/geo/disaster_recovery/)

*   [Promoting a **secondary** Geo node in single-secondary configurations](#promoting-a-secondary-geo-node-in-single-secondary-configurations)
    *   [Step 1\. Allow replication to finish if possible](#step-1-allow-replication-to-finish-if-possible)
    *   [Step 2\. Permanently disable the **primary** node](#step-2-permanently-disable-the-primary-node)
    *   [Step 3\. Promoting a **secondary** node](#step-3-promoting-a-secondary-node)
        *   [Promoting a **secondary** node running on a single machine](#promoting-a-secondary-node-running-on-a-single-machine)
        *   [Promoting a **secondary** node with multiple servers](#promoting-a-secondary-node-with-multiple-servers)
        *   [Promoting a **secondary** node with an external PostgreSQL database](#promoting-a-secondary-node-with-an-external-postgresql-database)
    *   [Step 4\. (Optional) Updating the primary domain DNS record](#step-4-optional-updating-the-primary-domain-dns-record)
    *   [Step 5\. (Optional) Add **secondary** Geo node to a promoted **primary** node](#step-5-optional-add-secondary-geo-node-to-a-promoted-primary-node)
    *   [Step 6\. (Optional) Removing the secondary’s tracking database](#step-6-optional-removing-the-secondarys-tracking-database)
*   [Promoting secondary Geo replica in multi-secondary configurations](#promoting-secondary-geo-replica-in-multi-secondary-configurations)
    *   [Step 1\. Prepare the new **primary** node to serve one or more **secondary** nodes](#step-1-prepare-the-new-primary-node-to-serve-one-or-more-secondary-nodes)
    *   [Step 2\. Initiate the replication process](#step-2-initiate-the-replication-process)
*   [Troubleshooting](#troubleshooting)
    *   [I followed the disaster recovery instructions and now two-factor auth is broken](#i-followed-the-disaster-recovery-instructions-and-now-two-factor-auth-is-broken)

# Disaster Recovery (Geo)[](#disaster-recovery-geo-premium-only "Permalink")

Geo 复制您的数据库,Git 存储库和其他少量资产. 将来,我们将支持和复制更多数据,使您能够在灾难情况下以最少的精力进行故障转移.

有关更多信息,请参见地[电流限制](../replication/index.html#current-limitations) .

**警告:**多辅助配置的灾难恢复在**Alpha 中** . 有关最新更新,请查看多级[灾难恢复史诗](https://gitlab.com/groups/gitlab-org/-/epics/65) .

## Promoting a **secondary** Geo node in single-secondary configurations[](#promoting-a-secondary-geo-node-in-single-secondary-configurations "Permalink")

目前,我们不提供自动方式来升级 Geo 副本并进行故障转移,但是如果您具有对该计算机的`root`访问权,则可以手动进行.

此过程将**辅助**地理节点升级为**主要**节点. 为了尽快恢复地理冗余,应在遵循这些说明后立即添加新的**辅助**节点.

### Step 1\. Allow replication to finish if possible[](#step-1-allow-replication-to-finish-if-possible "Permalink")

如果**辅助**节点仍在从**主**节点复制数据,请尽可能严格遵循[计划的故障转移文档](planned_failover.html) ,以避免不必要的数据丢失.

### Step 2\. Permanently disable the **primary** node[](#step-2-permanently-disable-the-primary-node "Permalink")

**警告:**如果**主**节点脱机,则可能是**主**节点上保存的数据尚未复制到**辅助**节点. 如果继续,此数据应视为丢失.

如果**主**节点发生故障,则应尽一切可能避免发生裂脑情况,即在两个不同的 GitLab 实例中可能发生写操作,从而使恢复工作复杂化. 因此,为故障转移做准备,我们必须禁用**主**节点.

1.  SSH 进入**主**节点以停止并禁用 GitLab,如果可能的话:

    sudo gitlab-ctl stop 

    如果服务器意外重启,请阻止 GitLab 重新启动:

    sudo systemctl disable gitlab-runsvdir 

    **注意:(** **仅 CentOS** )在 CentOS 6 或更旧的版本中,如果没有可用的机器重启,没有简单的方法可以阻止启动[GitLab](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/3058) (请参阅[Omnibus GitLab 问题#3058](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/3058) ). 完全卸载 GitLab 软件包可能是最安全的:

    yum remove gitlab-ee 

    **注意:** ( **Ubuntu 14.04 LTS** )如果您使用的是较旧版本的 Ubuntu 或基于 Upstart init 系统的任何其他发行版,则可以通过以下操作来阻止 GitLab 在计算机重启时启动:

    initctl stop gitlab-runsvvdir
    echo 'manual' > /etc/init/gitlab-runsvdir.override
    initctl reload-configuration 

2.  如果您没有对**主**节点的 SSH 访问权限,请使计算机脱机并通过任何方式阻止其重启. 由于您可能有很多方法可以完成此操作,因此我们将避免使用单个建议. 您可能需要:

    *   重新配置负载均衡器.
    *   更改 DNS 记录(例如,将主要 DNS 记录指向**辅助**节点,以停止使用**主要**节点).
    *   停止虚拟服务器.
    *   阻止通过防火墙的流量.
    *   从**主**节点撤消对象存储权限.
    *   物理断开机器连接.
3.  如果您打算[更新主域 DNS 记录](#step-4-optional-updating-the-primary-domain-dns-record) ,则可能希望立即降低 TTL,以加快传播速度.

### Step 3\. Promoting a **secondary** node[](#step-3-promoting-a-secondary-node "Permalink")


*   A new **secondary** should not be added at this time. If you want to add a new **secondary**, do this after you have completed the entire process of promoting the **secondary** to the **primary**.
*   如果遇到`ActiveRecord::RecordInvalid: Validation failed: Name has already been taken`在此过程中, `ActiveRecord::RecordInvalid: Validation failed: Name has already been taken`错误,请阅读[故障排除建议](../replication/troubleshooting.html#fixing-errors-during-a-failover-or-when-promoting-a-secondary-to-a-primary-node) .

#### Promoting a **secondary** node running on a single machine[](#promoting-a-secondary-node-running-on-a-single-machine "Permalink")

1.  SSH 登录到**辅助**节点并以 root 用户身份登录:

    sudo -i 

2.  编辑`/etc/gitlab/gitlab.rb`以通过删除启用`geo_secondary_role`所有行来反映其新的**主要**状态:

    ## In pre-11.5 documentation, the role was enabled as follows. Remove this line.
    geo_secondary_role['enable'] = true

    ## In 11.5+ documentation, the role was enabled as follows. Remove this line.
    roles ['geo_secondary_role'] 


    在将辅助节点升级为主节点之前,应运行飞行前检查. 它们可以单独运行,也可以与升级脚本一起运行.


    gitlab-ctl promote-to-primary-node 



    gitlab-ctl promote-to-primary-node --skip-preflight-check 


    gitlab-ctl promotion-preflight-checks 

4.  验证您可以使用先前用于**辅助**节点的 URL 连接到新提升的**主**节点.
5.  如果成功,则**辅助**节点现在已提升为**主要**节点.

#### Promoting a **secondary** node with multiple servers[](#promoting-a-secondary-node-with-multiple-servers "Permalink")

`gitlab-ctl promote-to-primary-node`命令尚不能与多台服务器一起使用,因为它只能在仅一台机器的**辅助** `gitlab-ctl promote-to-primary-node`上执行更改. 相反,您必须手动执行此操作.

1.  SSH 进入**辅助**数据库中的数据库节点,并触发 PostgreSQL 升级为可读写:

    sudo gitlab-pg-ctl promote 

    在 GitLab 12.8 及更早版本中,请参阅[消息: `sudo: gitlab-pg-ctl: command not found`](../replication/troubleshooting.html#message-sudo-gitlab-pg-ctl-command-not-found) .

2.**辅助**计算机上的每台计算机上编辑`/etc/gitlab/gitlab.rb` ,以通过删除启用`geo_secondary_role`所有行来将其新状态反映为**主要** `geo_secondary_role`

    ## In pre-11.5 documentation, the role was enabled as follows. Remove this line.
    geo_secondary_role['enable'] = true

    ## In 11.5+ documentation, the role was enabled as follows. Remove this line.
    roles ['geo_secondary_role'] 

    进行这些更改后,请在每台机器上[重新配置 GitLab,](../../restart_gitlab.html#omnibus-gitlab-reconfigure)以使更改生效.

3.**中学**提升到**小学** . SSH 进入单个应用程序服务器并执行:

    sudo gitlab-rake geo:set_secondary_as_primary 

4.  验证您可以使用先前用于**辅助服务器**的 URL 连接到新升级的**主**服务器.
5.  成功! **中学**已升格为**小学** .

#### Promoting a **secondary** node with an external PostgreSQL database[](#promoting-a-secondary-node-with-an-external-postgresql-database "Permalink")

`gitlab-ctl promote-to-primary-node`命令不能与外部 PostgreSQL 数据库一起使用,因为它只能在使用 GitLab 的**辅助**节点和数据库在同一台机器上执行更改. 结果,需要手动处理:

1.  升级与**辅助**站点关联的副本数据库. 这会将数据库设置为可读写:
    *   Amazon RDS- [将只读副本提升为独立数据库实例](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html#USER_ReadRepl.Promote)
    *   PostgreSQL 的 Azure 数据库- [停止复制](https://docs.microsoft.com/en-us/azure/postgresql/howto-read-replicas-portal#stop-replication)
    *   其他外部 PostgreSQL 数据库-将以下脚本保存在辅助节点中,例如`/tmp/geo_promote.sh` ,然后修改连接参数以匹配您的环境. 然后,执行它以提升副本:

         #!/bin/bash PG_SUPERUSER = postgres # The path to your pg_ctl binary. You may need to adjust this path to match # your PostgreSQL installation PG_CTL_BINARY = /usr/lib/postgresql/10/bin/pg_ctl # The path to your PostgreSQL data directory. You may need to adjust this # path to match your PostgreSQL installation. You can also run # `SHOW data_directory;` from PostgreSQL to find your data directory PG_DATA_DIRECTORY = /etc/postgresql/10/main # Promote the PostgreSQL database and allow read/write operations sudo -u $PG_SUPERUSER $PG_CTL_BINARY -D $PG_DATA_DIRECTORY promote 

2.**辅助**站点中的每个节点上编辑`/etc/gitlab/gitlab.rb` ,以通过删除启用`geo_secondary_role`所有行来将其新状态反映**为主** `geo_secondary_role`

    ## In GitLab 11.4 and earlier, remove this line.
    geo_secondary_role['enable'] = true

    ## In GitLab 11.5 and later, remove this line.
    roles ['geo_secondary_role'] 

    进行这些更改后,请在每个节点上[重新配置 GitLab](../../restart_gitlab.html#omnibus-gitlab-reconfigure) ,以使更改生效.

3.**中学**提升到**小学** . SSH 进入单个辅助应用程序节点并执行:

    sudo gitlab-rake geo:set_secondary_as_primary 

4.  验证您可以使用先前用于**辅助**站点的 URL 连接到新升级的**主**站点.

成功! **辅助**站点现在已提升为**主要**站点.

### Step 4\. (Optional) Updating the primary domain DNS record[](#step-4-optional-updating-the-primary-domain-dns-record "Permalink")

将主域的 DNS 记录更新为指向**辅助**节点将避免需要将对主域的所有引用更新为辅助域,例如更改 Git 远程服务器和 API URL.

1.  SSH 进入**辅助**节点并以 root 用户身份登录:

    sudo -i 

2.  更新主域的 DNS 记录. 更新主域名的 DNS 记录指向**辅助**节点后,编辑`/etc/gitlab/gitlab.rb` **辅助**节点上,以反映新的网址:

    # Change the existing external_url configuration
    external_url 'https://<new_external_url>' 

    **Note:** Changing `external_url` won’t prevent access via the old secondary URL, as long as the secondary DNS records are still intact.
3.  重新配置**辅助**节点以使更改生效:

    gitlab-ctl reconfigure 

4.  执行以下命令以更新新提升的**主**节点 URL:

    gitlab-rake geo:update_primary_node_url 


5.  仅对于 GitLab 11.11 到 12.7,您可能需要更新数据库中的**主**节点名称. 此错误已在 GitLab 12.8 中修复.

    要确定是否需要执行此操作,请在`/etc/gitlab/gitlab.rb`文件中搜索`gitlab_rails["geo_node_name"]`设置. 如果用`#`注释掉或根本找不到它,则您将需要更新数据库中**主**节点的名称. 您可以像这样搜索它:

    grep "geo_node_name" /etc/gitlab/gitlab.rb 


    gitlab-rails runner 'Gitlab::Geo.primary_node.update!(name: GeoNode.current_node_name)' 

6.  验证您可以使用其 URL 连接到新升级的**主数据库** . 如果您更新了主域的 DNS 记录,则这些更改可能尚未传播,具体取决于以前的 DNS 记录 TTL.

### Step 5\. (Optional) Add **secondary** Geo node to a promoted **primary** node[](#step-5-optional-add-secondary-geo-node-to-a-promoted-primary-node "Permalink")

使用上述过程将**辅助**节点提升为**主要**节点不会在新的**主要**节点上启用 Geo.

要使新的**辅助**节点在线,请按照[Geo 设置说明进行操作](../replication/index.html#setup-instructions) .

### Step 6\. (Optional) Removing the secondary’s tracking database[](#step-6-optional-removing-the-secondarys-tracking-database "Permalink")

每个**次级**有一个用于保存从**初级**的所有项目的同步状态的特殊的跟踪数据库. 由于**辅助服务器**已经升级,因此不再需要跟踪数据库中的数据.


sudo rm -rf /var/opt/gitlab/geo-postgresql 

如果您在`gitlab.rb`文件中启用了任何`geo_secondary[]`配置选项,则可以安全地注释掉这些选项或将其删除,然后[重新配置 GitLab](../../restart_gitlab.html#omnibus-gitlab-reconfigure)以使更改生效.

## Promoting secondary Geo replica in multi-secondary configurations[](#promoting-secondary-geo-replica-in-multi-secondary-configurations "Permalink")

如果您有多个**辅助**节点,并且需要升级其中一个,建议您按照[单辅助配置中的"](#promoting-a-secondary-geo-node-in-single-secondary-configurations)升级[**辅助** Geo"节点进行操作](#promoting-a-secondary-geo-node-in-single-secondary-configurations) ,之后还需要执行两个额外步骤.

### Step 1\. Prepare the new **primary** node to serve one or more **secondary** nodes[](#step-1-prepare-the-new-primary-node-to-serve-one-or-more-secondary-nodes "Permalink")

1.  SSH 进入新的**主**节点并以 root 用户身份登录:

    sudo -i 

2.  Edit `/etc/gitlab/gitlab.rb`

    ## Enable a Geo Primary role (if you haven't yet)
    roles ['geo_primary_role']

    # Allow PostgreSQL client authentication from the primary and secondary IPs. These IPs may be
    # public or VPC addresses in CIDR format, for example ['', '']
    postgresql['md5_auth_cidr_addresses'] = ['<primary_node_ip>/32', '<secondary_node_ip>/32']

    # Every secondary server needs to have its own slot so specify the number of secondary nodes you're going to have
    postgresql['max_replication_slots'] = 1

    ## Disable automatic database migrations temporarily
    ## (until PostgreSQL is restarted and listening on the private address).
    gitlab_rails['auto_migrate'] = false 

    (有关这些设置的更多详细信息,您可以阅读[配置主服务器](../replication/database.html#step-1-configure-the-primary-server) )

3.  保存文件并重新配置 GitLab,以进行数据库侦听更改和要应用的复制插槽更改.

    gitlab-ctl reconfigure 

    重新启动 PostgreSQL 以使其更改生效:

    gitlab-ctl restart postgresql 

4.  现在,重新启动 PostgreSQL 并重新侦听私有地址,即可重新启用迁移.

    编辑`/etc/gitlab/gitlab.rb`并将配置**更改**为`true` :

    gitlab_rails['auto_migrate'] = true 

    保存文件并重新配置 GitLab:

    gitlab-ctl reconfigure 

### Step 2\. Initiate the replication process[](#step-2-initiate-the-replication-process "Permalink")

现在,我们需要使每个**辅助**节点侦听新的**主要**节点上的更改. 为此,您需要再次[启动复制过程](../replication/database.html#step-3-initiate-the-replication-process) ,但这一次是针对另一个**主**节点. 所有旧的复制设置将被覆盖.

## Troubleshooting[](#troubleshooting "Permalink")

### I followed the disaster recovery instructions and now two-factor auth is broken[](#i-followed-the-disaster-recovery-instructions-and-now-two-factor-auth-is-broken "Permalink")

10.5 之前的 Geo 的安装说明无法复制`otp_key_base`机密,该机密用于加密存储在数据库中的两因素身份验证机密. 如果**主**节点和**辅助**节点之间的设置不同,启用了双重身份验证的用户将无法在故障转移后登录.

如果您仍然可以访问旧的**主**节点,则可以按照" [升级到 GitLab 10.5"](../replication/version_specific_updates.html#updating-to-gitlab-105)部分中的说明解决错误. 否则,密码将丢失,您需要[为所有用户重置两步验证](../../../security/two_factor_authentication.html#disabling-2fa-for-everyone) .