FTS detects when primary is in recovery avoiding config change
Previous behavior when primary is in crash recovery FTS probe fails and hence qqprimary is marked down. This change provides a recovery progress metric so that FTS can detect progress. We added last replayed LSN number inside the error message to determine recovery progress. This allows FTS to distinguish between recovery in progress and recovery hang or rolling panics. Only when FTS detects recovery is not making progress then FTS marks primary down. For testing a new fault injector is added to allow simulation of recovery hang and recovery in progress. Just fyi...this reverts the reverted commit 7b7219a4. Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io> Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
Showing
想要评论请 注册 或 登录