503.md 40.2 KB
Newer Older
Lab机器人's avatar
readme  
Lab机器人 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844
# Geo Troubleshooting

> 原文:[https://docs.gitlab.com/ee/administration/geo/replication/troubleshooting.html](https://docs.gitlab.com/ee/administration/geo/replication/troubleshooting.html)

*   [Basic troubleshooting](#basic-troubleshooting)
    *   [Check the health of the **secondary** node](#check-the-health-of-the-secondary-node)
    *   [Check if PostgreSQL replication is working](#check-if-postgresql-replication-is-working)
        *   [Are nodes pointing to the correct database instance?](#are-nodes-pointing-to-the-correct-database-instance)
        *   [Can Geo detect the current node correctly?](#can-geo-detect-the-current-node-correctly)
*   [Fixing errors found when running the Geo check Rake task](#fixing-errors-found-when-running-the-geo-check-rake-task)
*   [Fixing replication errors](#fixing-replication-errors)
    *   [Message: `ERROR: replication slots can only be used if max_replication_slots > 0`?](#message-error--replication-slots-can-only-be-used-if-max_replication_slots--0)
    *   [Message: `FATAL: could not start WAL streaming: ERROR: replication slot "geo_secondary_my_domain_com" does not exist`?](#message-fatal--could-not-start-wal-streaming-error--replication-slot-geo_secondary_my_domain_com-does-not-exist)
    *   [Message: “Command exceeded allowed execution time” when setting up replication?](#message-command-exceeded-allowed-execution-time-when-setting-up-replication)
    *   [Message: “PANIC: could not write to file `pg_xlog/xlogtemp.123`: No space left on device”](#message-panic-could-not-write-to-file-pg_xlogxlogtemp123-no-space-left-on-device)
    *   [Message: “ERROR: canceling statement due to conflict with recovery”](#message-error-canceling-statement-due-to-conflict-with-recovery)
    *   [Message: `LOG: invalid CIDR mask in address`](#message-log--invalid-cidr-mask-in-address)
    *   [Message: `LOG: invalid IP mask "md5": Name or service not known`](#message-log--invalid-ip-mask-md5-name-or-service-not-known)
    *   [Very large repositories never successfully synchronize on the **secondary** node](#very-large-repositories-never-successfully-synchronize-on-the-secondary-node)
    *   [New LFS objects are never replicated](#new-lfs-objects-are-never-replicated)
    *   [Resetting Geo **secondary** node replication](#resetting-geo-secondary-node-replication)
*   [Fixing errors during a failover or when promoting a secondary to a primary node](#fixing-errors-during-a-failover-or-when-promoting-a-secondary-to-a-primary-node)
    *   [Message: ActiveRecord::RecordInvalid: Validation failed: Name has already been taken](#message-activerecordrecordinvalid-validation-failed-name-has-already-been-taken)
    *   [Message: `NoMethodError: undefined method `secondary?' for nil:NilClass`](#message-nomethoderror-undefined-method-secondary-for-nilnilclass)
    *   [Message: `sudo: gitlab-pg-ctl: command not found`](#message-sudo-gitlab-pg-ctl-command-not-found)
*   [Fixing Foreign Data Wrapper errors](#fixing-foreign-data-wrapper-errors)
    *   [“Foreign Data Wrapper (FDW) is not configured” error](#foreign-data-wrapper-fdw-is-not-configured-error)
        *   [Checking configuration](#checking-configuration)
        *   [Manual reload of FDW schema](#manual-reload-of-fdw-schema)
    *   [“Geo database has an outdated FDW remote schema” error](#geo-database-has-an-outdated-fdw-remote-schema-error)
*   [Expired artifacts](#expired-artifacts)
*   [Fixing sign in errors](#fixing-sign-in-errors)
    *   [Message: The redirect URI included is not valid](#message-the-redirect-uri-included-is-not-valid)
*   [Fixing common errors](#fixing-common-errors)
    *   [Geo database configuration file is missing](#geo-database-configuration-file-is-missing)
    *   [An existing tracking database cannot be reused](#an-existing-tracking-database-cannot-be-reused)
    *   [Geo node has a database that is writable which is an indication it is not configured for replication with the primary node](#geo-node-has-a-database-that-is-writable-which-is-an-indication-it-is-not-configured-for-replication-with-the-primary-node)
    *   [Geo node does not appear to be replicating the database from the primary node](#geo-node-does-not-appear-to-be-replicating-the-database-from-the-primary-node)
    *   [Geo database version (…) does not match latest migration (…)](#geo-database-version--does-not-match-latest-migration-)
    *   [Geo database is not configured to use Foreign Data Wrapper](#geo-database-is-not-configured-to-use-foreign-data-wrapper)
    *   [GitLab indicates that more than 100% of repositories were synced](#gitlab-indicates-that-more-than-100-of-repositories-were-synced)
    *   [Geo Admin Area returns 404 error for a secondary node](#geo-admin-area-returns-404-error-for-a-secondary-node)

# Geo Troubleshooting[](#geo-troubleshooting-premium-only "Permalink")

设置地理位置需要仔细注意细节,有时很容易错过一个步骤.

这是您尝试解决问题应采取的步骤的列表:

*   Perform [basic troubleshooting](#basic-troubleshooting).
*   修复所有[复制错误](#fixing-replication-errors) .
*   修复任何[外部数据包装程序](#fixing-foreign-data-wrapper-errors)错误.
*   修复所有[常见](#fixing-common-errors)错误.

## Basic troubleshooting[](#basic-troubleshooting "Permalink")

在尝试更高级的故障排除之前:

*   Check [the health of the **secondary** node](#check-the-health-of-the-secondary-node).
*   Check [if PostgreSQL replication is working](#check-if-postgresql-replication-is-working).

### Check the health of the **secondary** node[](#check-the-health-of-the-secondary-node "Permalink")

访问**主**节点的 **管理区>** 浏览器中的**地理位置**`/admin/geo/nodes` ). 我们在每个**辅助**节点上执行以下运行状况检查,以帮助识别是否存在问题:

*   节点是否在运行?
*   节点的辅助数据库是否已配置用于流复制?
*   是否已配置节点的辅助跟踪数据库?
*   节点的辅助跟踪数据库是否已连接?
*   节点的辅助跟踪数据库是否是最新的?

[![Geo health check](img/11728974fe5385112311ab02ea56783e.png)](img/geo_node_healthcheck.png)

有关如何解决从 UI 报告的常见错误的信息,请参阅" [修复常见错误"](#fixing-common-errors) .

如果用户界面无法正常工作,或者您无法登录,则可以手动运行地理健康检查以获取此信息以及更多详细信息.

此 Rake 任务可以在**主要****辅助** Geo 节点中的应用程序节点上运行:

```
sudo gitlab-rake gitlab:geo:check 
```

输出示例:

```
Checking Geo ...

GitLab Geo is available ... yes
GitLab Geo is enabled ... yes
This machine's Geo node name matches a database record ... yes, found a secondary node named "Shanghai"
GitLab Geo secondary database is correctly configured ... yes
Database replication enabled? ... yes
Database replication working? ... yes
GitLab Geo tracking database is configured to use Foreign Data Wrapper? ... yes
GitLab Geo tracking database Foreign Data Wrapper schema is up-to-date? ... yes
GitLab Geo HTTP(S) connectivity ...
* Can connect to the primary node ... yes
HTTP/HTTPS repository cloning is enabled ... yes
Machine clock is synchronized ... yes
Git user has default SSH configuration? ... yes
OpenSSH configured to use AuthorizedKeysCommand ... yes
GitLab configured to disable writing to authorized_keys file ... yes
GitLab configured to store new projects in hashed storage? ... yes
All projects are in hashed storage? ... yes

Checking Geo ... Finished 
```

通过在任何**辅助**应用程序节点上运行以下 Rake 任务,可以手动找到当前同步信息:

```
sudo gitlab-rake geo:status 
```

输出示例:

```
http://secondary.example.com/
-----------------------------------------------------
                        GitLab Version: 11.10.4-ee
                              Geo Role: Secondary
                         Health Status: Healthy
                          Repositories: 289/289 (100%)
                 Verified Repositories: 289/289 (100%)
                                 Wikis: 289/289 (100%)
                        Verified Wikis: 289/289 (100%)
                           LFS Objects: 8/8 (100%)
                           Attachments: 5/5 (100%)
                      CI job artifacts: 0/0 (0%)
                  Repositories Checked: 0/289 (0%)
                         Sync Settings: Full
              Database replication lag: 0 seconds
       Last event ID seen from primary: 10215 (about 2 minutes ago)
     Last event ID processed by cursor: 10215 (about 2 minutes ago)
                Last status report was: 2 minutes ago 
```

### Check if PostgreSQL replication is working[](#check-if-postgresql-replication-is-working "Permalink")

要检查 PostgreSQL 复制是否正常,请检查:

*   [Nodes are pointing to the correct database instance](#are-nodes-pointing-to-the-correct-database-instance).
*   [Geo can detect the current node correctly](#can-geo-detect-the-current-node-correctly).

#### Are nodes pointing to the correct database instance?[](#are-nodes-pointing-to-the-correct-database-instance "Permalink")

您应确保**主**地理节点指向具有写入权限的实例.

任何**辅助**节点都应仅指向只读实例.

#### Can Geo detect the current node correctly?[](#can-geo-detect-the-current-node-correctly "Permalink")

Geo 通过以下方式在`/etc/gitlab/gitlab.rb`找到当前计算机的 Geo 节点名称:

*   使用`gitlab_rails['geo_node_name']`设置.
*   如果未定义,请使用`external_url`设置.

该名称用于在其中查找具有相同**名称**的节点 **管理区>** **地理位置** .

要检查当前计算机的节点名称是否与数据库中的节点匹配,请运行检查任务:

```
sudo gitlab-rake gitlab:geo:check 
```

它显示当前计算机的节点名称以及匹配的数据库记录是**主**节点还是**辅助**节点.

```
This machine's Geo node name matches a database record ... yes, found a secondary node named "Shanghai" 
```

```
This machine's Geo node name matches a database record ... no
  Try fixing it:
  You could add or update a Geo node database record, setting the name to "https://example.com/".
  Or you could set this machine's Geo node name to match the name of an existing database record: "London", "Shanghai"
  For more information see:
  doc/administration/geo/replication/troubleshooting.md#can-geo-detect-the-current-node-correctly 
```

## Fixing errors found when running the Geo check Rake task[](#fixing-errors-found-when-running-the-geo-check-rake-task "Permalink")

运行此 Rake 任务时,如果未正确配置节点,则可能会看到错误:

```
sudo gitlab-rake gitlab:geo:check 
```

1.  连接到数据库时,Rails 没有提供密码

    ```
    Checking Geo ...

    GitLab Geo is available ... Exception: fe_sendauth: no password supplied
    GitLab Geo is enabled ... Exception: fe_sendauth: no password supplied
    ...
    Checking Geo ... Finished 
    ```

    *   确保将`gitlab_rails['db_password']`设置为为`postgresql['sql_user_password']`创建哈希时使用的纯文本密码.
2.  Rails 无法连接到数据库

    ```
    Checking Geo ...

    GitLab Geo is available ... Exception: FATAL:  no pg_hba.conf entry for host "1.1.1.1",  user "gitlab", database "gitlabhq_production", SSL on
    FATAL:  no pg_hba.conf entry for host "1.1.1.1", user "gitlab", database "gitlabhq_production", SSL off
    GitLab Geo is enabled ... Exception: FATAL:  no pg_hba.conf entry for host "1.1.1.1", user "gitlab", database "gitlabhq_production", SSL on
    FATAL:  no pg_hba.conf entry for host "1.1.1.1", user "gitlab", database "gitlabhq_production", SSL off
    ...
    Checking Geo ... Finished 
    ```

    *   确保您具有`postgresql['md5_auth_cidr_addresses']`包含的 rails 节点的 IP 地址.
    *   确保在 IP 地址上包括子网掩码: `postgresql['md5_auth_cidr_addresses'] = ['1.1.1.1/32']` .
3.  Rails 提供了错误的密码

    ```
    Checking Geo ...
    GitLab Geo is available ... Exception: FATAL:  password authentication failed for user "gitlab"
    FATAL:  password authentication failed for user "gitlab"
    GitLab Geo is enabled ... Exception: FATAL:  password authentication failed for user "gitlab"
    FATAL:  password authentication failed for user "gitlab"
    ...
    Checking Geo ... Finished 
    ```

    *   验证正确的密码设置为`gitlab_rails['db_password']`创建中的散列时所使用`postgresql['sql_user_password']`通过运行`gitlab-ctl pg-password-md5 gitlab`并输入密码.
4.  检查返回的不是辅助节点

    ```
    Checking Geo ...

    GitLab Geo is available ... yes
    GitLab Geo is enabled ... yes
    GitLab Geo secondary database is correctly configured ... not a secondary node
    Database replication enabled? ... not a secondary node
    ...
    Checking Geo ... Finished 
    ```

    *   确保您已在**主**节点的管理区添加辅助节点.
    *   在管理节点中添加辅助节点属于**主**节点时,请确保输入了`external_url`或`gitlab_rails['geo_node_name']` .
    *   之前 GitLab 12.4,编辑中的**主**节点的管理区的二次节点,并确保有一个尾随`/`在`Name`字段中.
5.  检查返回`Exception: PG::UndefinedTable: ERROR: relation "geo_nodes" does not exist`

    ```
    Checking Geo ...

    GitLab Geo is available ... no
      Try fixing it:
      Upload a new license that includes the GitLab Geo feature
      For more information see:
      https://about.gitlab.com/features/gitlab-geo/
    GitLab Geo is enabled ... Exception: PG::UndefinedTable: ERROR:  relation "geo_nodes" does not exist
    LINE 8:                WHERE a.attrelid = '"geo_nodes"'::regclass
                                              ^
    :               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                         pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
                         c.collname, col_description(a.attrelid, a.attnum) AS comment
                    FROM pg_attribute a
                    LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                    LEFT JOIN pg_type t ON a.atttypid = t.oid
                    LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
                   WHERE a.attrelid = '"geo_nodes"'::regclass
                     AND a.attnum > 0 AND NOT a.attisdropped
                   ORDER BY a.attnum
    ...
    Checking Geo ... Finished 
    ```

    在执行 PostgreSQL 主版本(9> 10)时,这是预期的. 跟随:

    *   [initiate-the-replication-process](database.html#step-3-initiate-the-replication-process)
    *   [Geo database has an outdated FDW remote schema](troubleshooting.html#geo-database-has-an-outdated-fdw-remote-schema-error)

## Fixing replication errors[](#fixing-replication-errors "Permalink")

以下各节概述了解决复制错误的疑难解答步骤.

### Message: `ERROR: replication slots can only be used if max_replication_slots > 0`?[](#message-error--replication-slots-can-only-be-used-if-max_replication_slots--0 "Permalink")

这意味着需要在**主**数据库上设置`max_replication_slots` PostgreSQL 变量. 在 GitLab 9.4 中,我们将此设置默认设置为 1.如果您有更多**辅助**节点,则可能需要增加该值.

确保重新启动 PostgreSQL 才能生效. 有关更多详细信息,请参见《 [PostgreSQL 复制设置](database.html#postgresql-replication)指南》.

### Message: `FATAL: could not start WAL streaming: ERROR: replication slot "geo_secondary_my_domain_com" does not exist`?[](#message-fatal--could-not-start-wal-streaming-error--replication-slot-geo_secondary_my_domain_com-does-not-exist "Permalink")

当 PostgreSQL 没有该名称的**辅助**节点的复制插槽时,会发生这种情况.

您可能需要在**辅助**节点上重新运行[复制过程](database.html) .

### Message: “Command exceeded allowed execution time” when setting up replication?[](#message-command-exceeded-allowed-execution-time-when-setting-up-replication "Permalink")

**辅助**节点上[启动复制过程](database.html#step-3-initiate-the-replication-process)时可能会发生这种情况,并表明您的初始数据集太大,无法在默认超时(30 分钟)内进行复制.

重新运行`gitlab-ctl replicate-geo-database` ,但是为`--backup-timeout`包含一个更大的值:

```
sudo gitlab-ctl \
   replicate-geo-database \
   --host=<primary_node_hostname> \
   --slot-name=<secondary_slot_name> \
   --backup-timeout=21600 
```

这将使初始复制最多需要六个小时才能完成,而不是默认的三十分钟. 根据安装要求进行调整.

### Message: “PANIC: could not write to file `pg_xlog/xlogtemp.123`: No space left on device”[](#message-panic-could-not-write-to-file-pg_xlogxlogtemp123-no-space-left-on-device "Permalink")

确定**主**数据库中是否有未使用的复制插槽. 这可能导致在`pg_xlog`建立大量的日志数据. 删除未使用的插槽可以减少`pg_xlog`使用的空间.

1.  启动 PostgreSQL 控制台会话:

    ```
    sudo gitlab-psql 
    ```

    注意: **注意:**使用`gitlab-rails dbconsole`无效,因为管理复制插槽需要超级用户权限.

2.  使用以下方法查看您的复制插槽:

    ```
    SELECT * FROM pg_replication_slots; 
    ```

`active``f`插槽不活动.

*   当该插槽应处于活动状态时,因为您已使用该插槽配置了**辅助**节点,请登录到该**辅助**节点,并检查 PostgreSQL 日志为什么复制未运行.

*   如果您不再使用该插槽(例如,您不再启用地理位置),则可以在 PostgreSQL 控制台会话中将其删除:

    ```
    SELECT pg_drop_replication_slot('<name_of_extra_slot>'); 
    ```

### Message: “ERROR: canceling statement due to conflict with recovery”[](#message-error-canceling-statement-due-to-conflict-with-recovery "Permalink")

在正常使用情况下,此错误很少会发生,并且系统具有足够的恢复能力.

但是,在某些情况下,辅助数据库上的某些数据库查询可能运行时间过长,从而增加了此错误的发生频率. 在某些时候,其中某些查询由于每次都会被取消而永远无法完成.

这些长期运行的查询[计划在将来删除](https://gitlab.com/gitlab-org/gitlab/-/issues/34269) ,但作为解决方法,我们建议启用[hot_standby_feedback](https://s0www0postgresql0org.icopy.site/docs/10/hot-standby.html) . 这增加了**主**节点上发生膨胀的可能性,因为它阻止了`VACUUM`删除最近失效的行. 但是,它已在 GitLab.com 上成功用于生产中.

要启用`hot_standby_feedback` ,请将以下内容添加到**辅助**节点上的`/etc/gitlab/gitlab.rb`

```
postgresql['hot_standby_feedback'] = 'on' 
```

然后重新配置 GitLab:

```
sudo gitlab-ctl reconfigure 
```

为了帮助我们解决这个问题,可以考虑在评论[这个问题](https://gitlab.com/gitlab-org/gitlab/-/issues/4489) .

### Message: `LOG: invalid CIDR mask in address`[](#message-log--invalid-cidr-mask-in-address "Permalink")

这发生在`postgresql['md5_auth_cidr_addresses']`格式错误的地址上.

```
2020-03-20_23:59:57.60499 LOG:  invalid CIDR mask in address "***"
2020-03-20_23:59:57.60501 CONTEXT:  line 74 of configuration file "/var/opt/gitlab/postgresql/data/pg_hba.conf" 
```

要解决此问题,请更新`postgresql['md5_auth_cidr_addresses']``/etc/gitlab/gitlab.rb`的 IP 地址,以遵守 CIDR 格式(即`1.2.3.4/32` ).

### Message: `LOG: invalid IP mask "md5": Name or service not known`[](#message-log--invalid-ip-mask-md5-name-or-service-not-known "Permalink")

当您在`postgresql['md5_auth_cidr_addresses']`添加了没有子网掩码的 IP 地址时,就会发生这种情况.

```
2020-03-21_00:23:01.97353 LOG:  invalid IP mask "md5": Name or service not known
2020-03-21_00:23:01.97354 CONTEXT:  line 75 of configuration file "/var/opt/gitlab/postgresql/data/pg_hba.conf" 
```

要解决此问题, `/etc/gitlab/gitlab.rb``postgresql['md5_auth_cidr_addresses']``postgresql['md5_auth_cidr_addresses']`添加子网掩码,以遵守 CIDR 格式(即`1.2.3.4/32` ).

### Very large repositories never successfully synchronize on the **secondary** node[](#very-large-repositories-never-successfully-synchronize-on-the-secondary-node "Permalink")

GitLab 对所有存储库克隆都设置了超时,包括项目导入和地理同步操作. 如果**主数据库**上的存储库的新`git clone`花费几分钟以上,则您可能会受到此影响.

要增加超时,请将以下行添加到**辅助**节点上的`/etc/gitlab/gitlab.rb`

```
gitlab_rails['gitlab_shell_git_timeout'] = 10800 
```

然后重新配置 GitLab:

```
sudo gitlab-ctl reconfigure 
```

这会将超时增加到三个小时(10800 秒). 选择足够长的时间来容纳您最大的存储库的完整克隆.

### New LFS objects are never replicated[](#new-lfs-objects-are-never-replicated "Permalink")

如果新的 LFS 对象永远不会复制到辅助 Geo 节点,请检查您正在运行的 GitLab 版本. GitLab 版本 11.11.x 或 12.0.x 受[错误](https://gitlab.com/gitlab-org/gitlab/-/issues/32696)影响, [该错误导致新的 LFS 对象无法复制到 Geo 辅助节点](https://gitlab.com/gitlab-org/gitlab/-/issues/32696) .

要解决此问题,请升级到 GitLab 12.1 或更高版本.

### Resetting Geo **secondary** node replication[](#resetting-geo-secondary-node-replication "Permalink")

如果您使**辅助**节点处于损坏状态,并且想要重置复制状态(从头开始),那么可以采取以下步骤来帮助您:

1.  停止 Sidekiq 和 Geo LogCursor

    可以使 Sidekiq 正常停止,但可以使其停止获取新作业,并等到当前作业完成处理为止.

    您需要在第一阶段发送**SIGTSTP 终止**信号,并在所有作业完成后向它们发送**SIGTERM** . 否则,只需使用`gitlab-ctl stop`命令.

    ```
    gitlab-ctl status sidekiq
    # run: sidekiq: (pid 10180) <- this is the PID you will use
    kill -TSTP 10180 # change to the correct PID

    gitlab-ctl stop sidekiq
    gitlab-ctl stop geo-logcursor 
    ```

    您可以查看 Sidekiq 日志以了解 Sidekiq 作业处理何时完成:

    ```
    gitlab-ctl tail sidekiq 
    ```

2.  重命名存储库存储文件夹并创建新的. 如果您不担心可能的孤立目录和文件,则只需跳过此步骤.

    ```
    mv /var/opt/gitlab/git-data/repositories /var/opt/gitlab/git-data/repositories.old
    mkdir -p /var/opt/gitlab/git-data/repositories
    chown git:git /var/opt/gitlab/git-data/repositories 
    ```

    **提示**您可能希望将来在确认不再需要`/var/opt/gitlab/git-data/repositories.old`时将其删除,以节省磁盘空间.
3.  *(可选)*重命名其他数据文件夹并创建新的

    **警告** : **辅助**节点上可能仍有文件已从**主**节点中删除,但未反映出删除情况. 如果您跳过此步骤,它们将永远不会从此 Geo 节点中删除.

    任何上载的内容(如文件附件,化身或 LFS 对象)都存储在以下两个路径之一的子文件夹中:

    *   `/var/opt/gitlab/gitlab-rails/shared`
    *   `/var/opt/gitlab/gitlab-rails/uploads`

    要重命名它们:

    ```
    gitlab-ctl stop

    mv /var/opt/gitlab/gitlab-rails/shared /var/opt/gitlab/gitlab-rails/shared.old
    mkdir -p /var/opt/gitlab/gitlab-rails/shared

    mv /var/opt/gitlab/gitlab-rails/uploads /var/opt/gitlab/gitlab-rails/uploads.old
    mkdir -p /var/opt/gitlab/gitlab-rails/uploads

    gitlab-ctl start geo-postgresql 
    ```

    重新配置以便重新创建文件夹并确保权限和所有权正确

    ```
    gitlab-ctl reconfigure 
    ```

4.  重置跟踪数据库

    ```
    gitlab-rake geo:db:drop  # on a secondary app node
    gitlab-ctl reconfigure   # on the tracking database node
    gitlab-rake geo:db:setup # on a secondary app node 
    ```

5.  重新启动先前停止的服务

    ```
    gitlab-ctl start 
    ```

6.  刷新外部数据包装器表

    ```
    gitlab-rake geo:db:refresh_foreign_tables 
    ```

## Fixing errors during a failover or when promoting a secondary to a primary node[](#fixing-errors-during-a-failover-or-when-promoting-a-secondary-to-a-primary-node "Permalink")

以下是故障转移期间或通过解决策略将辅助节点提升为主节点时可能遇到的错误.

### Message: ActiveRecord::RecordInvalid: Validation failed: Name has already been taken[](#message-activerecordrecordinvalid-validation-failed-name-has-already-been-taken "Permalink")

[升级**辅助**节点时](../disaster_recovery/index.html#step-3-promoting-a-secondary-node) ,您可能会遇到以下错误:

```
Running gitlab-rake geo:set_secondary_as_primary...

rake aborted!
ActiveRecord::RecordInvalid: Validation failed: Name has already been taken
/opt/gitlab/embedded/service/gitlab-rails/ee/lib/tasks/geo.rake:236:in `block (3 levels) in <top (required)>'
/opt/gitlab/embedded/service/gitlab-rails/ee/lib/tasks/geo.rake:221:in `block (2 levels) in <top (required)>'
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `<main>'
Tasks: TOP => geo:set_secondary_as_primary
(See full trace by running task with --trace)

You successfully promoted this node! 
```

如果您在运行`gitlab-rake geo:set_secondary_as_primary``gitlab-ctl promote-to-primary-node`时遇到此消息,请执行`gitlab-rake geo:set_secondary_as_primary`任一操作:

*   输入 Rails 控制台并运行:

    ```
    Rails.application.load_tasks; nil
    Gitlab::Geo.expire_cache!
    Rake::Task['geo:set_secondary_as_primary'].invoke 
    ```

*   如果安全的话,升级到 GitLab 12.6.3 或更高版本. 例如,如果故障转移只是一个测试. [缓存相关的错误](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/22021)是固定的.

### Message: `NoMethodError: undefined method `secondary?' for nil:NilClass`[](#message-nomethoderror-undefined-method-secondary-for-nilnilclass "Permalink")

[升级**辅助**节点时](../disaster_recovery/index.html#step-3-promoting-a-secondary-node) ,您可能会遇到以下错误:

```
sudo gitlab-rake geo:set_secondary_as_primary

rake aborted!
NoMethodError: undefined method `secondary?' for nil:NilClass
/opt/gitlab/embedded/service/gitlab-rails/ee/lib/tasks/geo.rake:232:in `block (3 levels) in <top (required)>'
/opt/gitlab/embedded/service/gitlab-rails/ee/lib/tasks/geo.rake:221:in `block (2 levels) in <top (required)>'
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `<main>'
Tasks: TOP => geo:set_secondary_as_primary
(See full trace by running task with --trace) 
```

该命令仅应在辅助节点上执行,如果尝试在主节点上运行此命令,则会显示此错误.

### Message: `sudo: gitlab-pg-ctl: command not found`[](#message-sudo-gitlab-pg-ctl-command-not-found "Permalink")

When [promoting a **secondary** node with multiple servers](../disaster_recovery/index.html#promoting-a-secondary-node-with-multiple-servers), you need to run the `gitlab-pg-ctl` command to promote the PostgreSQL read-replica database.

在 GitLab 12.8 和更早版本中,此命令将失败,并显示以下消息:

```
sudo: gitlab-pg-ctl: command not found 
```

在这种情况下,解决方法是使用二进制文件的完整路径,例如:

```
sudo /opt/gitlab/embedded/bin/gitlab-pg-ctl promote 
```

GitLab 12.9 及更高版本[不受此错误影响](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5147) .

## Fixing Foreign Data Wrapper errors[](#fixing-foreign-data-wrapper-errors "Permalink")

本节介绍解决潜在的外部数据包装器错误的方法.

### “Foreign Data Wrapper (FDW) is not configured” error[](#foreign-data-wrapper-fdw-is-not-configured-error "Permalink")

When setting up Geo, you might see this warning in the `gitlab-rake gitlab:geo:check` output:

```
GitLab Geo tracking database Foreign Data Wrapper schema is up-to-date? ... foreign data wrapper is not configured 
```

有几点要记住:

1.  FDW 设置在地理**跟踪**数据库上配置.
2.  配置的外部服务器允许登录到 Geo **二级**只读数据库.

默认情况下,Geo 辅助数据库和跟踪数据库在不同端口上的同一主机上运行. 即分别是 5432 和 5431.

#### Checking configuration[](#checking-configuration "Permalink")

**注意:**以下步骤仅适用于 Omnibus 安装. 在 GitLab 11.5 中**已不建议**在基于源的安装中使用 Geo.

要检查配置:

1.  SSH 进入**辅助**节点中的应用程序节点:

    ```
    sudo -i 
    ```

    注意:应用程序节点是指至少运行以下服务之一的任何计算机:

    *   `puma`
    *   `unicorn`
    *   `sidekiq`
    *   `geo-logcursor`
2.  输入数据库控制台:

    如果跟踪数据库在同一节点上运行:

    ```
    gitlab-geo-psql 
    ```

    或者,如果跟踪数据库在其他节点上运行,则在进入数据库控制台时必须指定用户和主机:

    ```
    gitlab-geo-psql -U gitlab_geo -h <IP of tracking database> 
    ```

    系统将提示您输入`gitlab_geo`用户的密码. 您可以在`/etc/gitlab/gitlab.rb`以纯文本`/etc/gitlab/gitlab.rb`找到它:

    ```
    geo_secondary['db_password'] = '<geo_tracking_db_password>' 
    ```

    通常在[步骤 3:在辅助节点上配置跟踪数据库时在跟踪数据库上](multiple_servers.html#step-3-configure-the-tracking-database-on-the-secondary-node)设置此密码,而在[步骤 4:在辅助节点上配置前端应用程序服务器](multiple_servers.html#step-4-configure-the-frontend-application-servers-on-the-secondary-node)期间在应用程序节点上设置该密码.

3.  使用以下语句检查是否存在任何表:

    ```
    SELECT * from information_schema.foreign_tables; 
    ```

    如果一切正常,您应该看到类似以下内容:

    ```
    gitlabhq_geo_production=# SELECT * from information_schema.foreign_tables;
      foreign_table_catalog  | foreign_table_schema |               foreign_table_name                | foreign_server_catalog  | foreign_server_name
    -------------------------+----------------------+-------------------------------------------------+-------------------------+---------------------
     gitlabhq_geo_production | gitlab_secondary     | abuse_reports                                   | gitlabhq_geo_production | gitlab_secondary
     gitlabhq_geo_production | gitlab_secondary     | appearances                                     | gitlabhq_geo_production | gitlab_secondary
     gitlabhq_geo_production | gitlab_secondary     | application_setting_terms                       | gitlabhq_geo_production | gitlab_secondary
     gitlabhq_geo_production | gitlab_secondary     | application_settings                            | gitlabhq_geo_production | gitlab_secondary
    <snip> 
    ```

    但是,如果查询返回`0 rows` ,则继续进行下一步.

4.  通过`\des+`检查外部服务器映射是否正确. 结果应如下所示:

    ```
    gitlabhq_geo_production=# \des+
    List of foreign servers
    -[ RECORD 1 ]--------+------------------------------------------------------------
    Name                 | gitlab_secondary
    Owner                | gitlab-psql
    Foreign-data wrapper | postgres_fdw
    Access privileges    | "gitlab-psql"=U/"gitlab-psql"                              +
                         | gitlab_geo=U/"gitlab-psql"
    Type                 |
    Version              |
    FDW Options          | (host '0.0.0.0', port '5432', dbname 'gitlabhq_production')
    Description          | 
    ```

    **注意:请**特别注意 FDW 选项下的主机和端口. 该配置应指向地理辅助数据库.

    如果需要尝试更改主机或密码,则以下查询演示了如何:

    ```
    ALTER SERVER gitlab_secondary OPTIONS (SET host '<my_new_host>');
    ALTER SERVER gitlab_secondary OPTIONS (SET port 5432); 
    ```

    如果更改主机和/或端口,则还必须在`/etc/gitlab/gitlab.rb`调整以下设置并运行`gitlab-ctl reconfigure` :

    *   `gitlab_rails['db_host']`
    *   `gitlab_rails['db_port']`
5.  检查通过`\deu+`是否正确配置了用户映射:

    ```
    gitlabhq_geo_production=# \deu+
                                                 List of user mappings
          Server      | User name  |                                  FDW Options
    ------------------+------------+--------------------------------------------------------------------------------
     gitlab_secondary | gitlab_geo | ("user" 'gitlab', password 'YOUR-PASSWORD-HERE')
    (1 row) 
    ```

    确保密码正确. 您可以通过运行`psql`来测试登录是否有效:

    ```
    # Connect to the tracking database as the `gitlab_geo` user
    sudo \
       -u git /opt/gitlab/embedded/bin/psql \
       -h /var/opt/gitlab/geo-postgresql \
       -p 5431 \
       -U gitlab_geo \
       -W \
       -d gitlabhq_geo_production 
    ```

    如果需要更正密码,以下查询将显示如何:

    ```
    ALTER USER MAPPING FOR gitlab_geo SERVER gitlab_secondary OPTIONS (SET password '<my_new_password>'); 
    ```

    如果更改用户或密码,则还必须在`/etc/gitlab/gitlab.rb`调整以下设置并运行`gitlab-ctl reconfigure` :

    *   `gitlab_rails['db_username']`
    *   `gitlab_rails['db_password']`

    如果要[在辅助数据库前面](database.html#pgbouncer-support-optional)使用[PgBouncer](database.html#pgbouncer-support-optional) ,请确保更新以下设置:

    *   `geo_postgresql['fdw_external_user']`
    *   `geo_postgresql['fdw_external_password']`

#### Manual reload of FDW schema[](#manual-reload-of-fdw-schema "Permalink")

如果仍然无法使 FDW 正常工作,则可能需要尝试手动重新加载 FDW 模式. 手动重新加载 FDW 模式:

1.  在运行 Geo 跟踪数据库的节点上,通过`gitlab_geo`用户进入 PostgreSQL 控制台:

    ```
    sudo \
       -u git /opt/gitlab/embedded/bin/psql \
       -h /var/opt/gitlab/geo-postgresql \
       -p 5431 \
       -U gitlab_geo \
       -W \
       -d gitlabhq_geo_production 
    ```

    确保为您的配置调整端口和主机名. 可能会要求您输入密码.

2.  通过以下方式重新加载架构:

    ```
    DROP SCHEMA IF EXISTS gitlab_secondary CASCADE;
    CREATE SCHEMA gitlab_secondary;
    GRANT USAGE ON FOREIGN SERVER gitlab_secondary TO gitlab_geo;
    IMPORT FOREIGN SCHEMA public FROM SERVER gitlab_secondary INTO gitlab_secondary; 
    ```

3.  测试查询是否有效:

    ```
    SELECT * from information_schema.foreign_tables;
    SELECT * FROM gitlab_secondary.projects limit 1; 
    ```

### “Geo database has an outdated FDW remote schema” error[](#geo-database-has-an-outdated-fdw-remote-schema-error "Permalink")

GitLab 可能会因`Geo database has an outdated FDW remote schema`错误`Geo database has an outdated FDW remote schema`消息而出错.

例如:

```
Geo database has an outdated FDW remote schema. It contains 229 of 236 expected tables. Please refer to Geo Troubleshooting. 
```

要解决此问题,请在**辅助服务器**上运行以下命令:

```
sudo gitlab-rake geo:db:refresh_foreign_tables 
```

## Expired artifacts[](#expired-artifacts "Permalink")

如果您出于某种原因注意到,地理辅助节点上的工件比地理主节点上的工件更多,则可以使用 Rake 任务来[清理孤立的工件文件](../../../raketasks/cleanup.html#remove-orphan-artifact-files) .

在 Geo **辅助**节点上,此命令还将清除与磁盘上的孤立文件有关的所有 Geo 注册表记录.

## Fixing sign in errors[](#fixing-sign-in-errors "Permalink")

### Message: The redirect URI included is not valid[](#message-the-redirect-uri-included-is-not-valid "Permalink")

如果您能够登录到**主**节点,但是在尝试登录到**辅助**节点时收到此错误,则应检查 Geo 节点的 URL 是否与其外部 URL 匹配.

1.  首先,访问 **管理区>** **地理位置** .
2.  找到受影响的**辅助节点** ,然后单击" **编辑"** .
3.  确保**URL**字段与在**辅助**节点的前端服务器上`external_url "https://gitlab.example.com"`中的`/etc/gitlab/gitlab.rb`中找到的值匹配.

## Fixing common errors[](#fixing-common-errors "Permalink")

本部分介绍了管理界面中报告的常见错误以及如何修复它们.

### Geo database configuration file is missing[](#geo-database-configuration-file-is-missing "Permalink")

GitLab 找不到或没有访问`database_geo.yml`配置文件的权限.

在 Omnibus GitLab 安装中,该文件应位于`/var/opt/gitlab/gitlab-rails/etc` . 如果它不存在或对其进行了无意的更改,请运行`sudo gitlab-ctl reconfigure`将其恢复到正确的状态.

如果此路径安装在远程卷上,请检查您的卷配置,并具有正确的权限.

### An existing tracking database cannot be reused[](#an-existing-tracking-database-cannot-be-reused "Permalink")

Geo cannot reuse an existing tracking database.

最安全的方式是使用全新的[辅助节点](#resetting-geo-secondary-node-replication) ,或通过遵循[重置地理辅助节点复制来重置](#resetting-geo-secondary-node-replication)整个[辅助节点](#resetting-geo-secondary-node-replication) .

### Geo node has a database that is writable which is an indication it is not configured for replication with the primary node[](#geo-node-has-a-database-that-is-writable-which-is-an-indication-it-is-not-configured-for-replication-with-the-primary-node "Permalink")

此错误是指 Geo 希望**辅助**节点上的数据库副本存在问题. 它通常意味着:

*   使用了不受支持的复制方法(例如,逻辑复制).
*   设置[地理数据库复制](database.html)的说明未正确遵循.
*   您的数据库连接详细信息不正确,即您在`/etc/gitlab/gitlab.rb`文件中指定了错误的用户.

**辅助**节点混淆的一个常见原因是它需要两个单独的 PostgreSQL 实例:

*   **主**节点的只读副本.
*   包含复制元数据的常规可写实例. 即,地理位置跟踪数据库.

### Geo node does not appear to be replicating the database from the primary node[](#geo-node-does-not-appear-to-be-replicating-the-database-from-the-primary-node "Permalink")

导致数据库无法正确复制的最常见问题是:

*   **Secondary** nodes cannot reach the **primary** node. Check credentials, firewall rules, etc.
*   SSL 证书问题. 确保从**主**节点复制了`/etc/gitlab/gitlab-secrets.json` .
*   数据库存储磁盘已满.
*   数据库复制插槽配置错误.
*   数据库未使用复制插槽或其他替代方法,由于已清除 WAL 文件,因此无法追赶.

确保按照[地理数据库复制](database.html)说明进行支持的配置.

### Geo database version (…) does not match latest migration (…)[](#geo-database-version--does-not-match-latest-migration- "Permalink")

如果您使用的是 Omnibus GitLab 安装,则升级期间可能会失败. 您可以:

*   Run `sudo gitlab-ctl reconfigure`.
*   通过运行以下`sudo gitlab-rake geo:db:migrate`手动触发数据库迁移: `sudo gitlab-rake geo:db:migrate`作为**辅助**节点上的 root 用户.

### Geo database is not configured to use Foreign Data Wrapper[](#geo-database-is-not-configured-to-use-foreign-data-wrapper "Permalink")

此错误表示地理位置跟踪数据库未配置 FDW 服务器和凭据.

See [“Foreign Data Wrapper (FDW) is not configured” error?](#foreign-data-wrapper-fdw-is-not-configured-error).

### GitLab indicates that more than 100% of repositories were synced[](#gitlab-indicates-that-more-than-100-of-repositories-were-synced "Permalink")

这可能是由项目注册表中的孤立记录引起的. 您可以[使用 Rake 任务](../../../administration/raketasks/geo.html#remove-orphaned-project-registries)清除它们.

### Geo Admin Area returns 404 error for a secondary node[](#geo-admin-area-returns-404-error-for-a-secondary-node "Permalink")

有时`sudo gitlab-rake gitlab:geo:check`指示**辅助**节点运行`sudo gitlab-rake gitlab:geo:check` ,但是在**主**节点的地理管理区域中返回了**辅助**节点 404 错误.

解决此问题的方法:

*   尝试使用`sudo gitlab-ctl restart` **辅助** `sudo gitlab-ctl restart` .
*   检查`/var/log/gitlab/gitlab-rails/geo.log`以查看**辅助**节点是否正在使用 IPv6 将其状态发送到**主要**节点. 如果是这样,请使用`/etc/hosts`文件中的 IPv4 将条目添加到**主**节点. 或者,您应该[在**主**节点上启用 IPv6](https://docs.gitlab.com/omnibus/settings/nginx.html) .