• A
    ProcDie: Reply only after syncing to mirror for commit-prepared. · 29b78ef4
    Ashwin Agrawal 提交于
    Upstream and for greenplum master if procdie is received while waiting for
    replication, just WARNING is issued and transaction moves forward without
    waiting for mirror. But that would cause inconsistency for QE if failover
    happens to such mirror missing the commit-prepared record.
    
    If only prepare is performed and primary is yet to process the commit-prepared,
    gxact is present in memory. If commit-prepared processing is complete on primary
    gxact is removed from memory. If gxact is found then we will flow through
    regular commit-prepared flow, emit the xlog record and sync the same to
    mirror. But if gxact is not found on primary, we used to return blindly success
    to QD. Hence, modified the code to always call `SyncRepWaitForLSN()` before
    replying to QD incase gxact is not found on primary.
    
    It calls `SyncRepWaitForLSN()` with the lsn value of `flush` from
    `xlogctl->LogwrtResult`, as there is no way to find-out the actual lsn value of
    commit-prepared record for primary. Usage of that lsn is based on following
    assumptions
    	- WAL always is written serially forward
    	- Synchronous mirror if has xlog record xyz must have xlog records before xyz
    	- Not finding gxact entry in-memory on primary for commit-prepared retry
      	  from QD means it was for sure committed (completed) on primary
    
    Since, the commit-prepared retry can be received if everything is done on
    segment but failed on some other segment, under concurrency we may call
    `SyncRepWaitForLSN()` with same lsn value multiple times given we are using
    latest flush point. Hence in GPDB check in `SyncRepQueueIsOrderedByLSN()`
    doesn't validate for unique entries but just validates the queue is sorted which
    is required for correctness. Without the same during ICW tests can hit assertion
    "!(SyncRepQueueIsOrderedByLSN(mode))".
    29b78ef4
xlog.c 371.3 KB