• A
    FTS detects when primary is in recovery avoiding config change · d453a4aa
    Ashwin Agrawal 提交于
    Previous behavior when primary is in crash recovery FTS probe fails and hence
    qqprimary is marked down. This change provides a recovery progress metric so that
    FTS can detect progress. We added last replayed LSN number inside the error
    message to determine recovery progress. This allows FTS to distinguish between
    recovery in progress and recovery hang or rolling panics. Only when FTS detects
    recovery is not making progress then FTS marks primary down.
    
    For testing a new fault injector is added to allow simulation of recovery hang
    and recovery in progress.
    
    Just fyi...this reverts the reverted commit 7b7219a4.
    Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
    Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
    d453a4aa
ftsprobe.h 3.1 KB