- 27 6月, 2018 3 次提交
-
-
由 Ashwin Agrawal 提交于
Upstream doesn't have it and not used anymore in Greenplum, so loose it.
-
由 Ashwin Agrawal 提交于
Now, that wal replication is enabled for QD and QE the code must be enabled.
-
由 Ashwin Agrawal 提交于
Upstream and for greenplum master if procdie is received while waiting for replication, just WARNING is issued and transaction moves forward without waiting for mirror. But that would cause inconsistency for QE if failover happens to such mirror missing the commit-prepared record. If only prepare is performed and primary is yet to process the commit-prepared, gxact is present in memory. If commit-prepared processing is complete on primary gxact is removed from memory. If gxact is found then we will flow through regular commit-prepared flow, emit the xlog record and sync the same to mirror. But if gxact is not found on primary, we used to return blindly success to QD. Hence, modified the code to always call `SyncRepWaitForLSN()` before replying to QD incase gxact is not found on primary. It calls `SyncRepWaitForLSN()` with the lsn value of `flush` from `xlogctl->LogwrtResult`, as there is no way to find-out the actual lsn value of commit-prepared record for primary. Usage of that lsn is based on following assumptions - WAL always is written serially forward - Synchronous mirror if has xlog record xyz must have xlog records before xyz - Not finding gxact entry in-memory on primary for commit-prepared retry from QD means it was for sure committed (completed) on primary Since, the commit-prepared retry can be received if everything is done on segment but failed on some other segment, under concurrency we may call `SyncRepWaitForLSN()` with same lsn value multiple times given we are using latest flush point. Hence in GPDB check in `SyncRepQueueIsOrderedByLSN()` doesn't validate for unique entries but just validates the queue is sorted which is required for correctness. Without the same during ICW tests can hit assertion "!(SyncRepQueueIsOrderedByLSN(mode))".
-
- 22 6月, 2018 2 次提交
-
-
由 Ashwin Agrawal 提交于
This reverts commit a7842ea9. Yet to fully investigate the issue but its hitting the Assertion (""!(SyncRepQueueIsOrderedByLSN(mode))"", File: ""syncrep.c"", Line: 214) sometimes.
-
由 Ashwin Agrawal 提交于
Upstream and for greenplum master if procdie is received while waiting for replication, just WARNING is issued and transaction moves forward without waiting for mirror. But that would cause inconsistency for QE if failover happens to such mirror missing the commit-prepared record. If only prepare is performed and primary is yet to process the commit-prepared, gxact is present in memory. If commit-prepared processing is complete on primary gxact is removed from memory. If gxact is found then we will flow through regular commit-prepared flow, emit the xlog record and sync the same to mirror. But if gxact is not found on primary, we used to return blindly success to QD. Hence, modified the code to always call `SyncRepWaitForLSN()` before replying to QD incase gxact is not found on primary. It calls `SyncRepWaitForLSN()` with the lsn value of `flush` from `xlogctl->LogwrtResult`, as there is no way to find-out the actual lsn value of commit-prepared record for primary. Usage of that lsn is based on following assumptions - WAL always is written serially forward - Synchronous mirror if has xlog record xyz must have xlog records before xyz - Not finding gxact entry in-memory on primary for commit-prepared retry from QD means it was for sure committed (completed) on primary
-
- 12 5月, 2018 1 次提交
-
-
由 Ashwin Agrawal 提交于
-
- 02 5月, 2018 1 次提交
-
-
由 Ashwin Agrawal 提交于
Previous behavior when primary is in crash recovery FTS probe fails and hence qqprimary is marked down. This change provides a recovery progress metric so that FTS can detect progress. We added last replayed LSN number inside the error message to determine recovery progress. This allows FTS to distinguish between recovery in progress and recovery hang or rolling panics. Only when FTS detects recovery is not making progress then FTS marks primary down. For testing a new fault injector is added to allow simulation of recovery hang and recovery in progress. Just fyi...this reverts the reverted commit 7b7219a4. Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io> Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
-
- 01 5月, 2018 2 次提交
-
-
由 Ashwin Agrawal 提交于
This reverts commit 1b07e77a.
-
由 Ashwin Agrawal 提交于
Previous behavior when primary is in crash recovery FTS probe fails and hence primary is marked down. This change provides a recovery progress metric so that FTS can detect progress. We added last replayed LSN number inside the error message to determine recovery progress. This allows FTS to distinguish between recovery in progress and recovery hang or rolling panics. Only when FTS detects recovery is not making progress then FTS marks primary down. For testing a new fault injector is added to allow simulation of recovery hang and recovery in progress. Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io> Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
-
- 29 3月, 2018 1 次提交
-
-
由 Ashwin Agrawal 提交于
FTS went in infinite loop without this fix on probe if primary failed to respond back to probe request. Encountered the issue using suspend fault for fts message handler.
-
- 28 3月, 2018 1 次提交
-
-
由 Asim R P 提交于
Remove a log message to indicate if a QE reader is writing an XLOG record. Back in GPDB 4.3 when lazy XID feature didn't exist, a QE reader would be assigned a valid transaction ID. That could lead to extending CLOG and generating XLOG. This case no longer applies to GPDB.
-
- 16 3月, 2018 1 次提交
-
-
由 Taylor Vesely 提交于
This reverts commit a21da89e. These changes are obsolete now that persistent tables have been removed. Co-authored-by: NAsim R P <apraveen@pivotal.io>
-
- 10 3月, 2018 2 次提交
-
-
由 Asim R P 提交于
The flag pm_launch_walreceiver helps to determine if a mirror has made at least one attempt to connect with primary. For a small window, right after a mirror is promoted, the flag was still in effect. If an FTS probe request arrived in this window, the following assertion would trip: "FailedAssertion(""!(am_mirror)"", File: ""postmaster.c"", Line: 2201)" Reset the flag before signaling promotion so that it cannot interfere with new connections after promotion.
-
由 Heikki Linnakangas 提交于
Autovacuum has been completely disabled so far. In the upstream, even if you set autovacuum=off, it would still run, if necessary, to prevent XID wraparound, but in GPDB we would not launch it even for that. That is problematic for template0, and any other databases with datallowconn=false. If you cannot connect to a database, you cannot manually VACUUM it. Therefore, its datfrozenxid is never advanced. We had hacked our way through that by letting XID wraparound to happen for databases with datallowconn=false. The theory was that template0 - and hopefully any other such database! - was fully frozen, so there is no harm in letting XID counter to wrap around. However, you get trouble if you create a new database, using template0 as the template, around the time that XID wraparound for template0 is about to happen. The new database will inherit the datfrozenxid value, and because it will have datallowconn=true, the system will immediately shut down because now it looks like XID wraparound happened. To fix, re-enable autovacuum, in a very limited fashion. The autovacuum launcher is now started, but it will only perform anti-wraparound vacuums, and only on databases with datallowconn=false. This includes fixes for some garden-variety bugs that have been introduced to autovacuum, when merging with upstream, that have gone unnoticed because the code has been unused. Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/gqordopb6Gg/-GHXSE4qBwAJ
-
- 01 2月, 2018 2 次提交
-
-
由 Xin Zhang 提交于
If control file is not read, data_checksums GUC is set to false even when checksums are actually enabled in control file. Author: Xin Zhang <xzhang@pivotal.io> Author: Asim R P <apraveen@pivotal.io>
-
由 Heikki Linnakangas 提交于
Revert the state machine and other logic in postmaster.c the way it is in upstream. Remove some GUCs related to mirrored and non-mirrored mode. Remove the -M, -x and -y postmaster options, and change management scripts to not pass those options.
-
- 22 1月, 2018 1 次提交
-
-
由 Gang Xiong 提交于
1. move TMGXACT to PGPROC. 2. create distributed snapshot and create checkpoint will traverse procArray and acquire ProcArrayLock. shmControlLock is only used for serialize recoverTM(). 3. get rid of shmGxactArray and maintain an array of TMGXACT_LOG for recovery. Author: Gang Xiong <gxiong@pivotal.io> Author: Asim R P <apraveen@pivotal.io> Author: Ashwin Agrawal <aagrawal@pivotal.io>
-
- 18 1月, 2018 4 次提交
-
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
This was accidentally duplicated in some earlier merge or backport. It's in an "#ifdef WIN32" block, so it's dead for GPDB, but let's be tidy.
-
由 Heikki Linnakangas 提交于
-
由 Daniel Gustafsson 提交于
With persistent tables removed there are no more consumers of this API, so remove it.
-
- 17 1月, 2018 2 次提交
-
-
由 Heikki Linnakangas 提交于
The change in index.c is a behavioral change. The behavior on reindexing shared catalogs now matches upstream again. The rest is just removal of dead code.
-
由 Heikki Linnakangas 提交于
Pointed out by Coverity.
-
- 13 1月, 2018 17 次提交
-
-
由 Ashwin Agrawal 提交于
This is mostly cherry-pick of upstream commit commit 970a1868 Author: Robert Haas <rhaas@postgresql.org> Date: Fri Dec 3 08:44:15 2010 -0500 Use GUC lexer for recovery.conf parsing. This eliminates some crufty, special-purpose code and, as a non-trivial side benefit, allows recovery.conf parameters to be unquoted. Dimitri Fontaine, with review and cleanup by Alvaro Herrera, Itagaki Takahiro, and me.
-
由 Xin Zhang 提交于
- detect primary goes down - flip the role to m/p/d/n and p/m/u/n (role/prefer/status/mode) in gp_segment_configuration - send promotion message to mirror to promote it Author: Xin Zhang <xzhang@pivotal.io> Author: Jacob Champion <pchampion@pivotal.io> Author: Asim R P <apraveen@pivotal.io>
-
由 Heikki Linnakangas 提交于
It was now unused.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
-
由 Jacob Champion 提交于
This reverts commit 0277b584 (which was already partially reverted in 0039844) and commit 4b4643f8. Commit 2b939748 already applied the logical intent of these commits, as well as some improvements which were accidentally reverted.
-
由 Max Yang 提交于
* Change test cases for gpinistandby -F parameter. * Fix get standby data directory by gp_segment_configuration instead of pg_filespace_entry * Fix gpinitstandby to follow old behavior if option -d not given Author: Xiaoran Wang <xiwang@pivotal.io> Author: Max Yang <myang@pivotal.io>
-
由 Max Yang 提交于
A timeline history file would be generated to track timeline change, and we copy the last wal file as wal file of current timeline. The split point in history file would take care of correct recovery. Author: Xiaoran Wang <xiwang@pivotal.io> Author: Max Yang <myang@pivotal.io>
-
由 Jacob Champion 提交于
The previous patch failed, since we could not actually access the catalog without some more initialization. Pull as much initialization from the old Pass4 as possible -- some of it is likely not needed, but this seems to get things working for now.
-
由 Asim R P 提交于
As a followup to the removal of multipass logic in commit 813b817c: * Make sure that recovery.conf file does not exist after promotion. The file is renamed to recovery.done, like in upstream. * Update gp_segment_configuration after promotion only if we are a master's standby. * We can't actually access the catalog without some more initialization. Pull as much initialization from the old Pass4 as possible -- some of it is likely not needed, but this seems to get things working for now. Author: Asim R P <apraveen@pivotal.io> Author: Jacob Champion <pchampion@pivotal.io>
-
由 Heikki Linnakangas 提交于
Remove the concept of filespaces, revert tablespaces to work the same as in upstream. There is some leftovers in management tools. I don't know how to test all that, and I was afraid of touching things I can't run. Also, we may need to create replacements for some of those things on top of tablespaces, to make the management of tablespaces easier, and it might be easier to modify the existing tools than write them from scratch. (Yeah, you could always look at the git history, but still.) Per the discussion on gpdb-dev mailing list, the plan is to cherry-pick commit 16d8e594 from PostgreSQL 9.2, to make it possible to have a different path for a tablespace in the primary and its mirror. But that's not included in this commit yet. TODO: Make temp_tablespaces work. TODO: Make pg_dump do something sensible, when dumping from a GPDB 5 cluster that uses filespaces. Same with pg_upgrade. Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/sON4lraPEqg/v3lkM587BAAJ
-
由 Jacob Champion 提交于
The DB_IN_STANDBY_NEW_TLI_SET state doesn't really seem to do anything anymore, as of commit 813b817cc. Remove it entirely to get rid of an assertion during standby tests. Also remove multipass function declarations; they're gone.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
Revert the code to open/read/write regular files, to the way it's in the upstream.
-
由 Heikki Linnakangas 提交于
WAL replication is the name of the game on this branch.
-
由 Heikki Linnakangas 提交于
And clean up some comments that talked about persistent tables.
-
由 Heikki Linnakangas 提交于
If there is some unused space at the end of a WAL page, because we never split WAL record header, the WAL receiver's flush and apply positions were reported a bit funnily. The flush position would report the end of the page, including the unused padding, while the apply position would only go up to the end of last WAL record on the page, excluding the padding page. If you compare flush == apply positions, it would look as if not all of the WAL had been applied yet, even though the difference between the pointers was just the unused padding space. This will get fixed in PostgreSQL 9.3, where the padding at end of WAL page is eliminated, but until then, tweak the reporting of the apply position to also include any end-of-page padding. That makes the flush == apply comparison a valid way to check if all the flushed WAL has been applied, even at page boundaries. I believe this explains the "unable to obtain start synced LSN values between primary and mirror" failures we've been seeing from the gp_replica_check test. gp_replica_check waits for apply == flush, and if the last WAL record lands at a page boundary, that condition never became true because of the padding. (Although I'm not sure why it used to work earlier, or did it?)
-