group-replication-responses-failure-partition.md 3.1 KB
Newer Older
茶陵後's avatar
茶陵後 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#### 18.7.7.2 Unreachable Majority Timeout

 By default, members that find themselves in a minority due to a network partition do not automatically leave the group. You can use the system variable [`group_replication_unreachable_majority_timeout`](group-replication-options.html#sysvar_group_replication_unreachable_majority_timeout) to set a number of seconds for a member to wait after losing contact with the majority of group members, and then exit the group. Setting a timeout means you do not need to pro-actively monitor for servers that are in a minority group after a network partition, and you can avoid the possibility of creating a split-brain situation (with two versions of the group membership) due to inappropriate intervention.

 When the timeout specified by [`group_replication_unreachable_majority_timeout`](group-replication-options.html#sysvar_group_replication_unreachable_majority_timeout) elapses, all pending transactions that have been processed by the member and the others in the minority group are rolled back, and the servers in that group move to the `ERROR` state. You can use the [`group_replication_autorejoin_tries`](group-replication-options.html#sysvar_group_replication_autorejoin_tries) system variable, which is available from MySQL 8.0.16, to make the member automatically try to rejoin the group at this point. From MySQL 8.0.21, this feature is activated by default and the member makes three auto-rejoin attempts. If the auto-rejoin procedure does not succeed or is not attempted, the minority member then follows the exit action specified by [`group_replication_exit_state_action`](group-replication-options.html#sysvar_group_replication_exit_state_action).

 Consider the following points when deciding whether or not to set an unreachable majority timeout:

* In a symmetric group, for example a group with two or four servers, if both partitions contain an equal number of servers, both groups consider themselves to be in a minority and enter the `ERROR` state. In this situation, the group has no functional partition.

* While a minority group exists, any transactions processed by the minority group are accepted, but blocked because the minority servers cannot reach quorum, until either [`STOP GROUP_REPLICATION`](stop-group-replication.html) is issued on those servers or the unreachable majority timeout is reached.

* If you do not set an unreachable majority timeout, the servers in the minority group never enter the `ERROR` state automatically, and you must stop them manually.

* Setting an unreachable majority timeout has no effect if it is set on the servers in the minority group after the loss of majority has been detected.

 If you do not use the [`group_replication_unreachable_majority_timeout`](group-replication-options.html#sysvar_group_replication_unreachable_majority_timeout)system variable, the process for operator invention in the event of a network partition is described in [Section 18.7.8, “Handling a Network Partition and Loss of Quorum”](group-replication-network-partitioning.html). The process involves checking which servers are functioning and forcing a new group membership if necessary.