- 13 4月, 2018 1 次提交
-
-
由 Asim R P 提交于
Commit b3f300b9 introduced the novel idea tracking oldest xmin among all distributed snapshots on QEs. However, the idea is not applicable to QD because all distributed transactions can be found in ProcArray on QD. Local oldest xmin is therefore the oldest xmin among all distributed snapshots on QD. This patch fixes the maintenance of oldest xmin on QD by avoiding DistributedLog_AdvanceOldestXmin() and all the heavy-lifting that it performs. Calling this on QD was also hitting the "local snapshot's xmin is older than recorded distributed oldestxmin" error occasionally in CI.
-
- 10 3月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Before this, in order to safely determine if a tuple can be vacuumed away, you would need an active distributed snapshot. Even if an XID was older than the locally-computed OldestXMin value, the XID might still be visible to some distributed snapshot that's active in the QD. This commit introduces a mechanism to track the "oldest xmin" across any distributed snapshots. That makes it possible to calculate an "oldest xmin" value in a QE, that covers any such distributed snapshots, even if the distributed transaction doesn't currently have an active connection to this QE. Every distributed snapshot contains such an "oldest xmin" value, but now we track the latest such value that we've seen in this QE, in shared memory. Therefore, it's not always 100% up-to-date, but it will reflect the situation as of the latest query that was dispatched from QD to this QE. The value returned by GetOldestXmin(), as well as RecentGlobalXmin, now includes any distributed transactions. So the value can now be used to determine which tuples are dead, like in upstream, without doing the extra check with the localXidSatisfiesAnyDistributedSnapshot() function. This allows reverting some changes in heap_tuple_freeze. This allows utility-mode VACUUMs, launched independently in QE nodes, to reclaim space. Previously, it could not remove any dead tuples that were ever visible to anyone, because it could not determine whether they might still be needed by some distributed transaction.
-
- 09 3月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
All in-progress transactions, even those in DTX_STATE_ACTIVE_NOT_DISTRIBUTED state, must be included in a distributed snapshot. All transactions begin in DTX_STATE_ACTIVE_NOT_DISTRIBUTED state, and can become distributed later on. This bug was introduced in commit ff97c70b, which added the ill-advised optimization to skip DTX_STATE_ACTIVE_NOT_DISTRIBUTED transactions. This showed up occasionally in the regression tests as a failure in the 'oidjoins' test, as a failure like this: @@ -230,9 +230,17 @@ SELECT ctid, attrelid FROM pg_catalog.pg_attribute fk WHERE attrelid != 0 AND NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.attrelid); - ctid | attrelid -------+---------- -(0 rows) + ctid | attrelid +---------+---------- + (20,10) | 17107 + (20,11) | 17107 + (20,12) | 17107 + (20,13) | 17107 + (20,14) | 17107 + (20,15) | 17107 + (20,8) | 17107 + (20,9) | 17107 +(8 rows) The plan for that is a hash anti-join, with 'pg_class' in the inner side, and 'pg_attribute' on the outer side. If a table was created concurrently with that query (with oid 17107 in the above case), you could get the failure. What happens is that the concurrent CREATE TABLE transaction was assigned a distributed XID that was incorrectly not included in the distributed snapshot that the query took. Hence, the transaction became visible to the query immediately, as soon as it committed. If the CREATE TABLE transaction committed between the full scan of pg_class, and the scan on pg_attribute, the query would not see the just-inserted pg_class row, but would see the pg_attribute rows.
-
- 23 1月, 2018 1 次提交
-
-
由 xiong-gang 提交于
Entry DB process share snapshot with QD process, but it didn't update the TransactionXmin. The assert in SubTransGetData() will fail in some cases: Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin)); 1.QD takes a snapshot which contains a in-progress transaction A. 2.Transaction A commits. 3.QD creates a gang of entry DB. The entry DB process takes a snapshot in InitPostgres and updates TransactionXmin. 4.The entry DB process scans the tuple inserted by the transaction A, and find it's in the snapshot but its xid is larger than TransactionXmin.
-
- 22 1月, 2018 1 次提交
-
-
由 Gang Xiong 提交于
1. move TMGXACT to PGPROC. 2. create distributed snapshot and create checkpoint will traverse procArray and acquire ProcArrayLock. shmControlLock is only used for serialize recoverTM(). 3. get rid of shmGxactArray and maintain an array of TMGXACT_LOG for recovery. Author: Gang Xiong <gxiong@pivotal.io> Author: Asim R P <apraveen@pivotal.io> Author: Ashwin Agrawal <aagrawal@pivotal.io>
-
- 28 10月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
If the caller specifies DF_WITH_SNAPSHOT, so that the command is dispatched to the segments with a snapshot, but it currently has no active snapshot in the QD itself, that seems like a mistake. In qdSerializeDtxContextInfo(), the comment talked about which snapshot to use when the transaction has already been aborted. I didn't quite understand that. I don't think the function is used to dispatch the "ABORT" statement itself, and we shouldn't be dispatching anything else in an already-aborted transaction. This makes it more clear which snapshot is dispatched along with the command. In theory, the latest or serializable snapshot can be different from the one being used when the command is dispatched, although I'm not sure if there are any such cases in practice. In the upcoming 8.4 merge, there are more changes coming up to snapshot management, which make it more difficult to get hold of the latest acquired snapshot in the transaction, so changing this now will ease the pain of merging that. I don't know why, but after making the change in qdSerializeDtxContextInfo, I started to get a lot of "Too many distributed transactions for snapshot (maxCount %d, count %d)" errors. Looking at the code, I don't understand how it ever worked. I don't see any no guarantee that the array in TempQDDtxContextInfo or TempDtxContextInfo was pre-allocated correctly. Or maybe it got allocated big enough to hold max_prepared_xacts, which was always large enough, but it seemed rather haphazard to me. So in the spirit of "if you don't understand it, rewrite it until you do", I changed the way the allocation of the inProgressXidArray array works. In statically allocated snapshots, i.e. SerializableSnapshot and LatestSnapshot, the array is malloc'd. In a snapshot copied with CopySnapshot(), it is points to a part of the palloc'd space for the snapshot. Nothing new so far, but I changed CopySnapshot() to set "maxCount" to -1 to indicate that it's not malloc'd. Then I modified DistributedSnapshot_Copy and DistributedSnapshot_Deserialize to not give up if the target array is not large enough, but enlarge it as needed. Finally, I made a little optimization in GetSnapshotData() when running in a QE, to move the copying of the distributed snapshot data to outside the section guarded by ProcArrayLock. ProcArrayLock can be heavily contended, so that's a nice little optimization anyway, but especially now that DistributedSnapshot_Copy() might need to realloc the array.
-
- 25 8月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
ereport() has one subtle but important difference to elog: it doesn't evaluate its arguments, if the log level says that the message doesn't need to be printed. This makes a small but measurable difference in performance, if the arguments contain more complicated expressions, like function calls. While performance testing a workload with very short queries, I saw some CPU time being used in DtxContextToString. Those calls were coming from the arguments to elog() statements, and the result was always thrown away, because the log level was not high enough to actually log anything. Turn those elog()s into ereport()s, for speed. The problematic case here was a few elogs containing DtxContextToString calls, in hot codepaths, but I changed a few surrounding ones too, for consistency. Simplify the mock test, to not bother mocking elog(), while we're at it. The real elog/ereport work just fine in the mock environment.
-
- 07 6月, 2017 1 次提交
-
-
由 Pengzhou Tang 提交于
This commit restore TCP interconnect and fix some hang issues. * restore TCP interconnect code * Add GUC called gp_interconnect_tcp_listener_backlog for tcp to control the backlog param of listen call * use memmove instead of memcpy because the memory areas do overlap. * call checkForCancelFromQD() for TCP interconnect if there are no data for a while, this can avoid QD from getting stuck. * revert cancelUnfinished related modification in 8d251945, otherwise some queries will get stuck * move and rename faultinjector "cursor_qe_reader_after_snapshot" to make test cases pass under TCP interconnect.
-
- 02 6月, 2017 1 次提交
-
-
由 Xin Zhang 提交于
Originally, the reader kept copies of subtransaction information in two places. First, it copied SharedLocalSnapshotSlot to share between writer and reader. Second, reader kept another copy in subxbuf for better performance. Due to lazy xid, subtransaction information can change in the writer asynchronously with respect to the reader. This caused reader's subtransaction information out of date. This fix removes those copies of subtransaction information in the reader and adds a reference to the writer's PGPROC to SharedLocalSnapshotSlot. Reader should refer to subtransaction information through writer's PGPROC and pg_subtrans. Also added is a lwlock per shared snapshot slot. The lock protects shared snapshot information between a writer and readers belonging to the same session. Fixes github issues #2269 and #2284. Signed-off-by: NAsim R P <apraveen@pivotal.io>
-
- 01 6月, 2017 1 次提交
-
-
由 Ashwin Agrawal 提交于
Before this commit, snapshot stored information of distributed in-progress transactions populated during snapshot creation and its corresponding localXids found during tuple visibility check later (used as cache) by reverse mapping using single tightly coupled data structure DistributedSnapshotMapEntry. Storing the information this way possed couple of problems: 1] Only one localXid can be cached for a distributedXid. For sub-transactions same distribXid can be associated with multiple localXid, but since can cache only one, for other local xids associated with distributedXid need to consult the distributed_log. 2] While performing tuple visibility check, code must loop over full size of distributed in-progress array always first to check if cached localXid can be utilized to avoid reverse mapping. Now, decoupled the distributed in-progress with local xids cache separately. So, this allows us to store multiple xids per distributedXid. Also, allows to optimize scanning localXid only if tuple xid is relevant to it and also scanning size only equivalent to number of elements cached instead of size of distributed in-progress always even if nothing was cached. Along the way, refactored relevant code a bit as well to simplify further.
-
- 28 4月, 2017 1 次提交
-
-
由 Ashwin Agrawal 提交于
For vacuum, page pruning and freezing to perform its job correctly on QE's, it needs to know globally what's the lowest dxid till any transaction can see in full cluster. Hence QD must calculate and send that info to QE. For this purpose using logic similar to one for calculating globalxmin by local snapshot. TMGXACT for global transactions serves similar to PROC and hence its leveraged to provide us lowest gxid for its snapshot. Further using its array, shmGxactArray, can easily find the lowest across all global snapshots and pass down to QE via snapshot. Adding unit test for createDtxSnapshot along with the change.
-
- 13 4月, 2017 1 次提交
-
-
由 Ashwin Agrawal 提交于
Coverity reported: Either the check against null is unnecessary, or there may be a null pointer dereference. In ProcArrayEndTransaction: Pointer is checked against null but then dereferenced anyway. While its not an issue and for commit case, pointer is never null, but simplify the code and stop using pointer itself here.
-
- 01 4月, 2017 3 次提交
-
-
由 Ashwin Agrawal 提交于
Commit fb86c90d "Simplify management of distributed transactions." cleanedup lot of code for LocalDistribXactData and introduced LocalDistribXactData in PROC for debugging purpose. But it's only correctly maintained for QE's, QD never populated LocalDistribXactData in MyProc. Instead TMGXACT also had LocalDistribXactData which was just set initially for QD but never updated later and confused more than serving the purpose. Hence removing LocalDistribXactData from TMGXACT, as it already has other fields which provide required information. Also, cleaned-up QD related states as even in PROC only QE uses LocalDistribXactData.
-
由 Ashwin Agrawal 提交于
As part of 8.3 merge, upstream commit 295e6398 "Implement lazy XID allocation" was merged. But transactionIds were still allocated in StartTransaction as code changes required to make it work for GPDB with distrbuted transaction was pending, thereby feature remained as disabled. Some progress was made by commit a54d84a3 "Avoid assigning an XID to DTX_CONTEXT_QE_AUTO_COMMIT_IMPLICIT queries." Now this commit addresses the pending work needed for handling deferred xid allocation correctly with distributed transactions and fully enables the feature. Important highlights of changes: 1] Modify xlog write and xlog replay record for DISTRIBUTED_COMMIT. Even if transacion is read-only for master and no xid is allocated to it, it can still be distributed transaction and hence needs to persist itself in such a case. So, write xlog record even if no local xid is assigned but transaction is prepared. Similarly during xlog replay of the XLOG_XACT_DISTRIBUTED_COMMIT type, perform distributed commit recovery ignoring local commit. Which also means for this case don't commit to distrbuted log, as its only used to perform reverse map of localxid to distributed xid. 2] Remove localXID from gxact, as its no more needed to be maintained and used. 3] Refactor code for QE Reader StartTransaction. There used to be wait-loop with sleep checking to see if SharedLocalSnapshotSlot has distributed XID same as that of READER to assign reader some xid as that of writer, for SET type commands till READER actually performs GetSnapShotData(). Since now a) writer is not going to have valid xid till it performs some write, writers transactionId turns out InvalidTransaction always here and b) read operations like SET doesn't need xid, any more hence need for this wait is gone. 4] Thow error if using distributed transaction without distributed xid. Earlier AssignTransactionId() was called for this case in StartTransaction() but such scenario doesn't exist hence convert it to ERROR. 5] QD earlier during snapshot creation in createDtxSnapshot() was able to assign localXid in inProgressEntryArray corresponding to distribXid, as localXid was known by that time. That's no more the case and localXid mostly will get assigned after snapshot is taken. Hence now even for QD similar to QE's snapshot creation time localXid is not populated but later found in DistributedSnapshotWithLocalMapping_CommittedTest(). There is chance to optimize and try to match earlier behavior somewhat by populating gxact in AssignTransactionId() once locakXid is known but currently seems not so much worth it as QE's anyways have to perform the lookups.
-
由 Ashwin Agrawal 提交于
Leverage the fact that inProgressEntryArray is sorted based on distribXid while creating the snapshot in createDtxSnapshot. So, can break out fast in function DistributedSnapshotWithLocalMapping_CommittedTest().
-
- 07 3月, 2017 2 次提交
-
-
由 Ashwin Agrawal 提交于
`MyProc->inCommit` is to protect checkpoint running during inCommit transactions. However, `MyProc->lxid` has to be valid because `GetVirtualXIDsDelayingChkpt()` and `HaveVirtualXIDsDelayingChkpt()` require `VirtualTransactionIdIsValid()` in addition to `inCommit` to block the checkpoint process. In this fix, we defer clearing `inCommit` and `lxid` to `CommitTransaction()`.
-
由 Ashwin Agrawal 提交于
Originally checkpoint is checking for xid, however, xid is used to control the transaction visibility and it's crucial to clean this xid if process is done with commit and before release locks. However, checkpoint need to wait for the `AtExat_smgr()` to cleanup persistent table information, which happened after release locks, where `xid` is already cleaned. Hence, we use VXID, which doesn't have visibility impact. NOTE: Upstream PostgreSQL commit f21bb9cf for the similar fix.
-
- 10 2月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
Minimizes merge conflicts in the future.
-
- 25 1月, 2017 1 次提交
-
-
由 Ashwin Agrawal 提交于
As part of 8.3 merge via this upstream commit 92c2ecc1, code to ignore lazy vacuum from calculating RecentXmin and RecentGlobalXmin was introduced. In GPDB as part of lazy vacuum, reindex is performed for bitmap indexes, which generates tuples in pg_class with lazy vacuum's transaction ID. Ignoring lazy vacuum from RecentXmin and RecentGlobalXmin during GetSnapshotData caused incorrect setting of hintbits to `HEAP_XMAX_INVALID` for tuple intended to be deletd by lazy vacuum and breaking HOT chain. This transaction visibility issue was encountered in CI many times with parallel schedule `bitmap_index, analyze` failing with error `could not find pg_class tuple for index` at commit time of lazy vacuum. Hence this commit stops tracking lazy vacuum in MyProc and performing any specific action related to same.
-
- 21 12月, 2016 1 次提交
-
-
由 Ashwin Agrawal 提交于
QE reader leverages SharedLocalSnapshot to perform visibility checks. QE writer is responsible to keep the SharedLocalSnapshot up to date. Before this fix, SharedLocalSnapshot was only updated by writer while acquiring the snapshot. But if transaction id is assigned to subtransaction after it has taken the snapshot, it was not reflected. Due to this when QE reader called TransactionIdIsCurrentTransactionId, it may get sometimes false based on timings for subtransaction ids used by QE writer to insert/update tuples. Hence to fix the situation, SharedLocalSnapshot is now updated when assigning transaction id and deregistered if subtransaction aborts. Also, adding faultinjector to suspend cursor QE reader instead of guc/sleep used in past. Moving cursor tests from bugbuster to ICG and adding deterministic test to exercise the behavior. Fixes #1276, reported by @pengzhout
-
- 16 7月, 2016 1 次提交
-
-
由 Heikki Linnakangas 提交于
We used to have a separate array of LocalDistributedXactData instances, and a reference in PGPROC to its associated LocalDistributedXact. That's unnecessarily complicated: we can store the LocalDistributedXact information directly in the PGPROC entry, and get rid fo the auxiliary array and the bookkeeping needed to manage that array. This doesn't affect the backend-private cache of committed Xids that also lives in cdblocaldistribxact.c. Now that the PGPROC->localDistributedXactData fields are never accessed by other backends, don't protect it with ProcArrayLock anymore. This makes the code simpler, and potentially improves performance too (ProcArrayLock can be very heavily contended on a busy system).
-
- 04 7月, 2016 1 次提交
-
-
由 Daniel Gustafsson 提交于
Callers to FaultInjector_InjectFaultIfSet() which don't pass neither databasename nor tablename and that use DDLNotSpecified can instead use the convenient macro SIMPLE_FAULT_INJECTOR() which cuts down on the boilerplate in the code. This commit does not bring any changes in functionality, merely readability.
-
- 28 6月, 2016 1 次提交
-
-
由 Kenan Yao 提交于
-
- 10 5月, 2016 1 次提交
-
-
由 Heikki Linnakangas 提交于
Not urgent to do right now, but makes merging and diffing easier.
-
- 09 12月, 2015 1 次提交
-
-
由 Heikki Linnakangas 提交于
As promised in previous commit. Upstream patch: commit bd0a2609 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Fri Jun 1 19:38:07 2007 +0000 Make CREATE/DROP/RENAME DATABASE wait a little bit to see if other backends will exit before failing because of conflicting DB usage. Per discussion, this seems a good idea to help mask the fact that backend exit takes nonzero time. Remove a couple of thereby-obsoleted sleeps in contrib and PL regression test sequences.
-
- 28 10月, 2015 1 次提交
-
-
- 07 7月, 2010 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 04 7月, 2010 1 次提交
-
-
由 Tom Lane 提交于
to have different values in different processes of the primary server. Also put it into the "Streaming Replication" GUC category; it doesn't belong in "Standby Servers" because you use it on the master not the standby. In passing also correct guc.c's idea of wal_keep_segments' category.
-
- 14 5月, 2010 1 次提交
-
-
由 Simon Riggs 提交于
without them, related to previous commit. Report by Bruce Momjian.
-
- 13 5月, 2010 1 次提交
-
-
由 Simon Riggs 提交于
of requirements and documentation on LogStandbySnapshot(). Fixes two minor bugs reported by Tom Lane that would lead to an incorrect snapshot after transaction wraparound. Also fix two other problems discovered that would give incorrect snapshots in certain cases. ProcArrayApplyRecoveryInfo() substantially rewritten. Some minor refactoring of xact_redo_apply() and ExpireTreeKnownAssignedTransactionIds().
-
- 30 4月, 2010 1 次提交
-
-
由 Tom Lane 提交于
confusion with streaming-replication settings. Also, change its default value to "off", because of concern about executing new and poorly-tested code during ordinary non-replicating operation. Per discussion. In passing do some minor editing of related documentation.
-
- 28 4月, 2010 1 次提交
-
-
由 Tom Lane 提交于
and be more tense about the locking requirements for it, to improve performance in Hot Standby mode. In passing fix a few bugs and improve a number of comments in the existing HS code. Simon Riggs, with some editorialization by Tom
-
- 22 4月, 2010 2 次提交
-
-
由 Simon Riggs 提交于
Clarify comments, downgrade a message to DEBUG and remove some debug counters. Direct from ideas by Heikki Linnakangas.
-
由 Simon Riggs 提交于
to handling of btree delete records mean that all snapshot conflicts on standby now have a valid, useful latestRemovedXid. Our earlier approach using LW_EXCLUSIVE was useful when we didnt always have a valid value, though is no longer useful or necessary. Asserts added to code path to prove and ensure this is the case. This will reduce contention and improve performance of larger Hot Standby servers.
-
- 20 4月, 2010 1 次提交
-
-
由 Simon Riggs 提交于
This prevents a rare, yet possible race condition at the exact moment of transition from recovery to normal running.
-
- 19 4月, 2010 1 次提交
-
-
由 Simon Riggs 提交于
through normal backends. Makes code clearer also, since we avoid various Assert()s. Performance of snapshots taken during recovery no longer depends upon number of read-only backends.
-
- 06 4月, 2010 1 次提交
-
-
由 Simon Riggs 提交于
-
- 11 3月, 2010 1 次提交
-
-
由 Heikki Linnakangas 提交于
assertion failure reported by Erik Rijkers, but this alone doesn't explain the failure.
-
- 26 2月, 2010 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 24 1月, 2010 1 次提交
-
-
由 Simon Riggs 提交于
woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock.
-