提交 aca57c40 编写于 作者: P Philipp Heckel

Documentation

上级 b61b174e
......@@ -10,6 +10,7 @@
"DataCenterPattern": "",
"DetectDataCenterQuery": "select substring_index(substring_index(@@hostname, '-',3), '-', -1) as dc",
"PhysicalEnvironmentPattern": "",
"DetectSemiSyncEnforcedQuery": ""
}
```
......@@ -17,7 +18,7 @@
By default, `orchestrator` uses `SHOW SLAVE STATUS` and takes a 1-second granularity value for lag. However this lag does not take into account cascading lags in the event of chained replication. Many use custom heartbeat mechanisms such as `pt-heartbeat`. This provides with "absolute" lag from master, as well as sub-second resolution.
`ReplicationLagQuery` allows you to setup your own query.
`ReplicationLagQuery` allows you to setup your own query.``~~~~``
### Cluster alias
......@@ -60,3 +61,75 @@ You will configure data center awareness in one of two methods:
### Cluster domain
To a lesser importance, and mostly for visibility, `DetectClusterDomainQuery` should return the VIP or CNAME or otherwise the address of the cluster's master
### Semi-sync topology
In some environments, it is important to control the not only the number of semi-sync replicas, but also if a replica is a semi-sync or an async replica.
`orchestrator` can detect an undesired semi-sync configuration and toggle the semi-sync flags
`rpl_semi_sync_slave_enabled` and `rpl_semi_sync_master_enabled` to correct the situation.
#### Semi-sync master (`rpl_semi_sync_master_enabled`)
`orchestrator` enables the semi-sync master flag during a master failover (e.g. `DeadMaster`) if `DetectSemiSyncEnforcedQuery` returns a value > 0
for the new master. `orchestrator` does not trigger any recoveries if the master flag is otherwise changed or incorrectly set.
A semi-sync master can enter two failure scenarios: [`LockedSemiSyncMaster`](failure-detection.md#lockedsemisyncmaster) and
[`MasterWithTooManySemiSyncReplicas`](failure-detection.md#masterwithtoomanysemisyncreplicas). `orchestrator` disables the
semi-sync master flag on semi-sync replicas during a recovery of either of these two conditions.
#### Semi-sync replicas (`rpl_semi_sync_slave_enabled`)
`orchestrator` can detect if there is an incorrect number of semi-sync replicas in the topology ([`LockedSemiSyncMaster`](failure-detection.md#lockedsemisyncmaster) and
[`MasterWithTooManySemiSyncReplicas`](failure-detection.md#masterwithtoomanysemisyncreplicas)), and can then correct the situation by enabling/disabling
the semi-sync replica flags accordingly.
This behavior can be controlled by the following options:
- `DetectSemiSyncEnforcedQuery`: query that returns the semi-sync priority (zero means async replica; higher number means higher priority)
- `EnforceExactSemiSyncReplicas`: flag that decides whether to enforce a _strict_ semi-sync replica topology. If enabled, the recovery of `LockedSemiSyncMaster`
and `MasterWithTooManyReplicas` will enable _and disable_ semi-sync on the replicas to match the desired topology exactly based on the priority order.
- `RecoverLockedSemiSyncMaster`: flag that decides whether to recover from a `LockedSemiSyncMaster` scenario. If enabled, the recovery of `LockedSemiSyncMaster`
will enable _(but never disable)_ semi-sync on the replicas in the priority order to match the master wait count. This option has no effect if
`EnforceExactSemiSyncReplicas` is set. It is only useful if you'd like to only handle a situation which which there are too few semi-sync replicas,
but not if there are too many.
- `ReasonableLockedSemiSyncMasterSeconds`: number of seconds after which the `LockedSemiSyncMaster` condition is triggered; if not set, falls back to `ReasonableReplicationLagSeconds`
The priority order is defined by `DetectSemiSyncEnforcedQuery` (zero means async replica; higher number is higher priority), the promotion rule (`DetectPromotionRuleQuery`)
and the hostname (fallback).
**Example 1**: Enforcing a strict semi-sync replica topology with two replicas and `rpl_semi_sync_master_wait_for_slave_count=1`:
```
"DetectSemiSyncEnforcedQuery": "select priority from meta.semi_sync where cluster_member = @@hostname",
"EnforceExactSemiSyncReplicas": true
```
Assuming this topology:
```
,- replica1 (priority = 10, rpl_semi_sync_slave_enabled = 1)
master
`- replica2 (priority = 20, rpl_semi_sync_slave_enabled = 1)
```
`orchestrator` would detect a [`MasterWithTooManySemiSyncReplicas`](failure-detection.md#masterwithtoomanysemisyncreplicas) scenario
and disable semi-sync on replica2.
**Example 2**: Enforcing a weak semi-sync replica toplogy with two replicas and `rpl_semi_sync_master_wait_for_slave_count=1`:
```
"DetectSemiSyncEnforcedQuery": "select 2586",
"DetectPromotionRuleQuery": "select promotion_rule from meta.promotion_rules where cluster_member = @@hostname",
"RecoverLockedSemiSyncMaster": true
```
Assuming this topology:
```
,- replica1 (priority = 2586, promotion rule = prefer, rpl_semi_sync_slave_enabled = 0)
master
`- replica2 (priority = 2586, promotion rule = neutral, rpl_semi_sync_slave_enabled = 0)
```
`orchestrator` would detect a [`LockedSemiSyncMaster`](failure-detection.md#lockedsemisyncmaster) scenario
and enable semi-sync on replica1.
......@@ -38,6 +38,7 @@ Observe the following list of potential failures:
* UnreachableMasterWithLaggingReplicas
* UnreachableMaster
* LockedSemiSyncMaster
* MasterWithTooManySemiSyncReplicas
* AllMasterReplicasNotReplicating
* AllMasterReplicasNotReplicatingOrDead
* DeadCoMaster
......@@ -96,15 +97,43 @@ This scenario can happen when the master is overloaded. Clients would see a "Too
`orchestrator` responds to this scenario by restarting replication on all of master's immediate replicas. This will close the old client connections on those replicas and attempt to initiate new ones. These may now fail to connect, leading to a complete replication failure on all replicas. This will next lead `orchestrator` to analyze a `DeadMaster`.
### LockedSemiSyncMaster
#### `LockedSemiSyncMaster`
1. Master is running with semi-sync enabled
1. Master is running with semi-sync enabled (`rpl_semi_sync_master_enabled=1`)
2. Number of connected semi-sync replicas falls short of expected `rpl_semi_sync_master_wait_for_slave_count`
3. `rpl_semi_sync_master_timeout` is high enough such that master locks writes and does not fall back to asynchronous replication
Remediation can be to disable semi-sync on the master, or to bring up (or enable) sufficient semi-sync replicas.
This condition only triggers after `ReasonableLockedSemiSyncMasterSeconds` has passed. If `ReasonableLockedSemiSyncMasterSeconds` is not set,
it trigger after `ReasonableReplicationLagSeconds`.
At this time `orchestrator` does not invoke processes for this type of analysis.
Remediation of this condition can be to disable semi-sync on the master, or to bring up (or enable) sufficient semi-sync replicas.
If `EnforceExactSemiSyncReplicas` is enabled, `orchestrator` will determine the desired semi-sync topology and enable/disable semi-sync on the replicas to match it.
The desired topology is defined by the priority order (see below) and the master wait count.
If `RecoverLockedSemiSyncMaster` is enabled, `orchestrator` will enable (but never disable) semi-sync on the replicas in priority order until
the number of semi-sync replicas matches the master wait count. Please note that `RecoverLockedSemiSyncMaster` has no effect if `EnforceExactSemiSyncReplicas` is set.
The priority order is defined by `DetectSemiSyncEnforcedQuery` (higher number is higher priority), the promotion rule (`DetectPromotionRuleQuery`) and the hostname (fallback).
If `EnforceExactSemiSyncReplicas` and `RecoverLockedSemiSyncMaster` are both disabled (default), `orchestrator` does not invoke any recovery processes for this type of analysis.
Please also consult the [semi-sync topology](configuration-discovery-classifying.md#semi-sync-topology) documentation for more details.
#### `MasterWithTooManySemiSyncReplicas`
1. Master is running with semi-sync enabled (`rpl_semi_sync_master_enabled=1`)
2. Number of connected semi-sync replicas is higher than the expected `rpl_semi_sync_master_wait_for_slave_count`
3. `EnforceExactSemiSyncReplicas` is enabled (this analysis is not triggered if this flag is not enabled)
If `EnforceExactSemiSyncReplicas` is enabled, `orchestrator` will determine the desired semi-sync topology and enable/disable semi-sync on the replicas to match it.
The desired topology is defined by the priority order and the master wait count.
The priority order is defined by `DetectSemiSyncEnforcedQuery` (higher number is higher priority), the promotion rule (`DetectPromotionRuleQuery`) and the hostname (fallback).
If `EnforceExactSemiSyncReplicas` is disabled (default), `orchestrator` does not invoke any recovery processes for this type of analysis.
Please also consult the [semi-sync topology](configuration-discovery-classifying.md#semi-sync-topology) documentation for more details.
### Failures of no interest
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册