1. 30 10月, 2017 1 次提交
  2. 28 10月, 2017 1 次提交
    • H
      When dispatching, send ActiveSnapshot along, not some random snapshot. · 4a95afc1
      Heikki Linnakangas 提交于
      If the caller specifies DF_WITH_SNAPSHOT, so that the command is dispatched
      to the segments with a snapshot, but it currently has no active snapshot in
      the QD itself, that seems like a mistake.
      
      In qdSerializeDtxContextInfo(), the comment talked about which snapshot to
      use when the transaction has already been aborted. I didn't quite
      understand that. I don't think the function is used to dispatch the "ABORT"
      statement itself, and we shouldn't be dispatching anything else in an
      already-aborted transaction.
      
      This makes it more clear which snapshot is dispatched along with the
      command. In theory, the latest or serializable snapshot can be different
      from the one being used when the command is dispatched, although I'm not
      sure if there are any such cases in practice.
      
      In the upcoming 8.4 merge, there are more changes coming up to snapshot
      management, which make it more difficult to get hold of the latest acquired
      snapshot in the transaction, so changing this now will ease the pain of
      merging that.
      
      I don't know why, but after making the change in qdSerializeDtxContextInfo,
      I started to get a lot of "Too many distributed transactions for snapshot
      (maxCount %d, count %d)" errors. Looking at the code, I don't understand
      how it ever worked. I don't see any no guarantee that the array in
      TempQDDtxContextInfo or TempDtxContextInfo was pre-allocated correctly.
      Or maybe it got allocated big enough to hold max_prepared_xacts, which
      was always large enough, but it seemed rather haphazard to me. So in
      the spirit of "if you don't understand it, rewrite it until you do", I
      changed the way the allocation of the inProgressXidArray array works.
      In statically allocated snapshots, i.e. SerializableSnapshot and
      LatestSnapshot, the array is malloc'd. In a snapshot copied with
      CopySnapshot(), it is points to a part of the palloc'd space for the
      snapshot. Nothing new so far, but I changed CopySnapshot() to set
      "maxCount" to -1 to indicate that it's not malloc'd. Then I modified
      DistributedSnapshot_Copy and DistributedSnapshot_Deserialize to not give up
      if the target array is not large enough, but enlarge it as needed. Finally,
      I made a little optimization in GetSnapshotData() when running in a QE, to
      move the copying of the distributed snapshot data to outside the section
      guarded by ProcArrayLock. ProcArrayLock can be heavily contended, so that's
      a nice little optimization anyway, but especially now that
      DistributedSnapshot_Copy() might need to realloc the array.
      4a95afc1
  3. 10 10月, 2017 1 次提交
  4. 15 9月, 2017 1 次提交
    • H
      Rewrite the way a DTM initialization error is logged, to retain file & lineno. · c6f931fe
      Heikki Linnakangas 提交于
      While working on the 8.4 merge, I had a bug that tripped an Insist inside
      the PG_TRY-CATCH. That was very difficult to track down, because the way
      the error is logged here. Using ereport() includes filename and line
      number where it's re-emitted, not the original place. So all I got was
      "Unexpected internal error" in the log, with meaningless filename & lineno.
      
      This rewrites the way the error is reported so that it preserves the
      original filename and line number. It will also use the original error
      level and will preserve all the other fields.
      c6f931fe
  5. 01 9月, 2017 1 次提交
  6. 30 8月, 2017 1 次提交
  7. 11 8月, 2017 4 次提交
  8. 09 8月, 2017 1 次提交
    • P
      Do not include gp-libpq-fe.h and gp-libpq-int.h in cdbconn.h · cf7cddf7
      Pengzhou Tang 提交于
      The whole cdb directory was shipped to end users and all header files
      that cdb*.h included are also need to be shipped to make checkinc.py
      pass. However, exposing gp_libpq_fe/*.h will confuse customer because
      they are almost the same as libpq/*, as Heikki's suggestion, we should
      keep gp_libpq_fe/* unchanged. So to make system work, we include
      gp-libpg-fe.h and gp-libpq-int.h directly in c files that need them
      cf7cddf7
  9. 07 7月, 2017 1 次提交
    • A
      Remove unused variable in checkpoint record. · f737c2d2
      Ashwin Agrawal 提交于
      segmentCount variable is unused in TMGXACT_CHECKPOINT structure hence loose it
      out. Also, removing the union in fspc_agg_state, tspc_agg_state and
      dbdir_agg_state structures as don't see reason for having the same.
      f737c2d2
  10. 20 6月, 2017 1 次提交
    • A
      Remove tmlock test and add an assert instead. · 944306d7
      Abhijit Subramanya 提交于
      The test used to validate that the tmlock is not held after completing the DTM
      recovery. The root cause for not releasing the lock was that in case of an
      error during recovery `elog_demote(WARNING)` was called which would demote the
      error to a warning. This would cause the abort processing code to not get
      executed and hence the lock would not be released. Adding a simple assert in
      the code once DTM recovery is complete is sufficient to make sure that the lock
      is released.
      944306d7
  11. 02 6月, 2017 1 次提交
    • X
      Remove subtransaction information from SharedLocalSnapshotSlot · b52ca70f
      Xin Zhang 提交于
      Originally, the reader kept copies of subtransaction information in
      two places.  First, it copied SharedLocalSnapshotSlot to share between
      writer and reader.  Second, reader kept another copy in subxbuf for
      better performance.  Due to lazy xid, subtransaction information can
      change in the writer asynchronously with respect to the reader.  This
      caused reader's subtransaction information out of date.
      
      This fix removes those copies of subtransaction information in the
      reader and adds a reference to the writer's PGPROC to
      SharedLocalSnapshotSlot.  Reader should refer to subtransaction
      information through writer's PGPROC and pg_subtrans.
      
      Also added is a lwlock per shared snapshot slot.  The lock protects
      shared snapshot information between a writer and readers belonging to
      the same session.
      
      Fixes github issues #2269 and #2284.
      Signed-off-by: NAsim R P <apraveen@pivotal.io>
      b52ca70f
  12. 01 6月, 2017 1 次提交
    • A
      Optimize DistributedSnapshot check and refactor to simplify. · 3c21b7d8
      Ashwin Agrawal 提交于
      Before this commit, snapshot stored information of distributed in-progress
      transactions populated during snapshot creation and its corresponding localXids
      found during tuple visibility check later (used as cache) by reverse mapping
      using single tightly coupled data structure DistributedSnapshotMapEntry. Storing
      the information this way possed couple of problems:
      
      1] Only one localXid can be cached for a distributedXid. For sub-transactions
      same distribXid can be associated with multiple localXid, but since can cache
      only one, for other local xids associated with distributedXid need to consult
      the distributed_log.
      
      2] While performing tuple visibility check, code must loop over full size of
      distributed in-progress array always first to check if cached localXid can be
      utilized to avoid reverse mapping.
      
      Now, decoupled the distributed in-progress with local xids cache separately. So,
      this allows us to store multiple xids per distributedXid. Also, allows to
      optimize scanning localXid only if tuple xid is relevant to it and also scanning
      size only equivalent to number of elements cached instead of size of distributed
      in-progress always even if nothing was cached.
      
      Along the way, refactored relevant code a bit as well to simplify further.
      3c21b7d8
  13. 28 4月, 2017 1 次提交
    • A
      Correct calculation of xminAllDistributedSnapshots and set it on QE's. · d887fe0c
      Ashwin Agrawal 提交于
      For vacuum, page pruning and freezing to perform its job correctly on QE's, it
      needs to know globally what's the lowest dxid till any transaction can see in
      full cluster. Hence QD must calculate and send that info to QE. For this purpose
      using logic similar to one for calculating globalxmin by local snapshot. TMGXACT
      for global transactions serves similar to PROC and hence its leveraged to
      provide us lowest gxid for its snapshot. Further using its array, shmGxactArray,
      can easily find the lowest across all global snapshots and pass down to QE via
      snapshot.
      
      Adding unit test for createDtxSnapshot along with the change.
      d887fe0c
  14. 01 4月, 2017 2 次提交
    • A
      Cleanup LocalDistribXactData related code. · 8c20bc94
      Ashwin Agrawal 提交于
      Commit fb86c90d "Simplify management of
      distributed transactions." cleanedup lot of code for LocalDistribXactData and
      introduced LocalDistribXactData in PROC for debugging purpose. But it's only
      correctly maintained for QE's, QD never populated LocalDistribXactData in
      MyProc. Instead TMGXACT also had LocalDistribXactData which was just set
      initially for QD but never updated later and confused more than serving the
      purpose. Hence removing LocalDistribXactData from TMGXACT, as it already has
      other fields which provide required information. Also, cleaned-up QD related
      states as even in PROC only QE uses LocalDistribXactData.
      8c20bc94
    • A
      Fully enable lazy XID allocation in GPDB. · 0932453d
      Ashwin Agrawal 提交于
      As part of 8.3 merge, upstream commit 295e6398
      "Implement lazy XID allocation" was merged. But transactionIds were still
      allocated in StartTransaction as code changes required to make it work for GPDB
      with distrbuted transaction was pending, thereby feature remained as
      disabled. Some progress was made by commit
      a54d84a3 "Avoid assigning an XID to
      DTX_CONTEXT_QE_AUTO_COMMIT_IMPLICIT queries." Now this commit addresses the
      pending work needed for handling deferred xid allocation correctly with
      distributed transactions and fully enables the feature.
      
      Important highlights of changes:
      
      1] Modify xlog write and xlog replay record for DISTRIBUTED_COMMIT. Even if
      transacion is read-only for master and no xid is allocated to it, it can still
      be distributed transaction and hence needs to persist itself in such a case. So,
      write xlog record even if no local xid is assigned but transaction is
      prepared. Similarly during xlog replay of the XLOG_XACT_DISTRIBUTED_COMMIT type,
      perform distributed commit recovery ignoring local commit. Which also means for
      this case don't commit to distrbuted log, as its only used to perform reverse
      map of localxid to distributed xid.
      
      2] Remove localXID from gxact, as its no more needed to be maintained and used.
      
      3] Refactor code for QE Reader StartTransaction. There used to be wait-loop with
      sleep checking to see if SharedLocalSnapshotSlot has distributed XID same as
      that of READER to assign reader some xid as that of writer, for SET type
      commands till READER actually performs GetSnapShotData(). Since now a) writer is
      not going to have valid xid till it performs some write, writers transactionId
      turns out InvalidTransaction always here and b) read operations like SET doesn't
      need xid, any more hence need for this wait is gone.
      
      4] Thow error if using distributed transaction without distributed xid. Earlier
      AssignTransactionId() was called for this case in StartTransaction() but such
      scenario doesn't exist hence convert it to ERROR.
      
      5] QD earlier during snapshot creation in createDtxSnapshot() was able to assign
      localXid in inProgressEntryArray corresponding to distribXid, as localXid was
      known by that time. That's no more the case and localXid mostly will get
      assigned after snapshot is taken. Hence now even for QD similar to QE's snapshot
      creation time localXid is not populated but later found in
      DistributedSnapshotWithLocalMapping_CommittedTest(). There is chance to optimize
      and try to match earlier behavior somewhat by populating gxact in
      AssignTransactionId() once locakXid is known but currently seems not so much
      worth it as QE's anyways have to perform the lookups.
      0932453d
  15. 07 3月, 2017 1 次提交
    • A
      Fix checkpoint wait for CommitTransaction. · 787992e4
      Ashwin Agrawal 提交于
      `MyProc->inCommit` is to protect checkpoint running during inCommit
      transactions.
      
      However, `MyProc->lxid` has to be valid because `GetVirtualXIDsDelayingChkpt()`
      and `HaveVirtualXIDsDelayingChkpt()` require `VirtualTransactionIdIsValid()` in
      addition to `inCommit` to block the checkpoint process.
      
      In this fix, we defer clearing `inCommit` and `lxid` to `CommitTransaction()`.
      787992e4
  16. 29 12月, 2016 1 次提交
    • A
      Fix interrupt count issue in DTM. · e1cac369
      Ashwin Agrawal 提交于
      PG_TRY - PG_CATCH block was added to distributed transaction commit prepared and
      abort prepared calls as part of commit c6320c. This call though happens to be
      inside HOLD_INTERRUPTS - RESUME_INTERRUPTS block. Hence need to maintain the
      interrupts counter correctly as any ERROR sets the InterruptHoldoffCount to 0 in
      elog code and due to this issue was hitting the PANIC "Resume interrupt holdoff
      count is bad (0)"
      e1cac369
  17. 13 12月, 2016 1 次提交
    • A
      Refactor distributed transaction phase 2 retry logic. · c6320c13
      Asim R P 提交于
      Refactor the phase 2 retry logic of distributed transaction so that the retry happens
      immediately after failure instead of happening inside EndCommand(). The patch also
      increases the number of retries in case of failure to 2 and introduces a guc called
      dtx_phase2_retry_count to control the number of retries.
      c6320c13
  18. 24 11月, 2016 1 次提交
    • D
      Guard against possible NULL pointer dereferencing · 280416b7
      Daniel Gustafsson 提交于
      Improves defensiveness of programming around pointer derefencing to
      ensure that we don't risk a NULL pointer. Most of these are quite
      straight-forward, those of note are discussed below.
      
      In doDispatchDtxProtocolCommand() we relied on the result data being
      created in zeroed out memory on CdbDispatchDtxProtocolCommand() which
      isn't guaranteed for every compiler. Explcitly set numResults to zero
      and also check the results for NULL.
      
      Per multiple reports by Coverity
      280416b7
  19. 04 11月, 2016 1 次提交
  20. 26 8月, 2016 1 次提交
    • H
      Silence compiler warning. · ed004c4b
      Heikki Linnakangas 提交于
      Gcc 6.1 complains about "tautological compare". Per the comment, the
      intention here is to unconditionally fail the assertion, so use a more
      straightforward Assert(false) to do that.
      ed004c4b
  21. 18 8月, 2016 1 次提交
  22. 25 7月, 2016 2 次提交
    • P
      Refactor command dispatch related function, · f7078db2
      Pengzhou Tang 提交于
      Original cdbdisp_dispatchRMCommand() and CdbDoCommand() is easy confusing. This commit combine
      them to one and meanwhile push down error handling to make coding easier.
      f7078db2
    • P
      Refactor utility statement dispatch interfaces · 01769ada
      Pengzhou Tang 提交于
      refactor CdbDispatchUtilityStatement() to make it flexible for cdbCopyStart(),
      dispatchVacuum() to call directly. Introduce flags like DF_NEED_TWO_SNAPSHOT,
      DF_WITH_SNAPSHOT, DF_CANCEL_ON_ERROR to make function call much clearer
      01769ada
  23. 16 7月, 2016 2 次提交
    • H
      Simplify management of distributed transactions. · fb86c90d
      Heikki Linnakangas 提交于
      We used to have a separate array of LocalDistributedXactData instances, and
      a reference in PGPROC to its associated LocalDistributedXact. That's
      unnecessarily complicated: we can store the LocalDistributedXact information
      directly in the PGPROC entry, and get rid fo the auxiliary array and the
      bookkeeping needed to manage that array.
      
      This doesn't affect the backend-private cache of committed Xids that also
      lives in cdblocaldistribxact.c.
      
      Now that the PGPROC->localDistributedXactData fields are never accessed
      by other backends, don't protect it with ProcArrayLock anymore. This makes
      the code simpler, and potentially improves performance too (ProcArrayLock
      can be very heavily contended on a busy system).
      fb86c90d
    • H
      Remove mechanism to poll QEs for max distributed XID at QD startup. · 2914c24f
      Heikki Linnakangas 提交于
      There's no need to try to make the dXIDs unique across restarts, because
      we always carry the QD startup timestamp along with dXIDs, which
      disambiguates the same dXID before and after restart.
      
      Per Asim RP's comments.
      2914c24f
  24. 04 7月, 2016 1 次提交
    • D
      Use SIMPLE_FAULT_INJECTOR() macro where possible · 38741b45
      Daniel Gustafsson 提交于
      Callers to FaultInjector_InjectFaultIfSet() which don't pass neither
      databasename nor tablename and that use DDLNotSpecified can instead
      use the convenient macro SIMPLE_FAULT_INJECTOR() which cuts down on
      the boilerplate in the code. This commit does not bring any changes
      in functionality, merely readability.
      38741b45
  25. 13 6月, 2016 1 次提交
    • K
      Dispatch exactly same text string for all slices. · 4b360942
      Kenan Yao 提交于
      Include a map from sliceIndex to gang_id in the dispatched string,
      and remove the localSlice field, hence QE should get the localSlice
      from the map now. By this way, we avoid duplicating and modifying
      the dispatch text string slice by slice, and each QE of a sliced
      dispatch would get same contents now.
      
      The extra space cost is sizeof(int) * SliceNumber bytes, and the extra
      computing cost is iterating the SliceNumber-size array. Compared with
      memcpy of text string for each slice in previous implementation, this
      way is much cheaper, because SliceNumber is much smaller than the size
      of dispatch text string. Also, since SliceNumber is so small, we just
      use an array for the map instead of a hash table.
      
      Also, clean up some dead code in dispatcher, including:
      (1) Remove primary_gang_id field of Slice struct and DispatchCommandDtxProtocolParms
      struct, since dispatch agent is deprecated now;
      (2) Remove redundant logic in cdbdisp_dispatchX;
      (3) Clean up buildGpDtxProtocolCommand;
      4b360942
  26. 21 5月, 2016 1 次提交
    • G
      refactor gang management code · 46dfa750
      Gang Xiong 提交于
      1) add one new type of gang: singleton reader gang.
      2) change interface of allocateGang.
      3) handling exceptions during gang creation: segment down and segment reset.
      4) cleanup some dead code.
      46dfa750
  27. 19 5月, 2016 1 次提交
  28. 13 5月, 2016 1 次提交
    • H
      Clean up the way the results array is allocated in cdbdisp_returnResults(). · 6a28c978
      Heikki Linnakangas 提交于
      I saw the "nresults < nslots" assertion fail, while hacking on something
      else. It happened when a Distributed Prepare command failed, and there were
      several error result sets from a segment. I'm not sure how normal it is to
      receive multiple ERROR responses to a single query, but the protocol
      certainly allows it, and I don't see any explanation for why the code used
      to assume that there can be at most 2 result sets from each segment.
      
      Remove that assumption, and make the code cope with more than two result
      sets from a segment, by calculating the required size of the array
      accurately.
      
      In the passing, remove the NULL-terminator from the array, and change the
      callers that depended on it to use the returned size variable instead.
      Makes the loops in the callers look less funky.
      6a28c978
  29. 10 5月, 2016 1 次提交
  30. 22 3月, 2016 1 次提交
  31. 12 2月, 2016 2 次提交
    • H
      Misc header file cleanup · 442c105e
      Heikki Linnakangas 提交于
      Remove unnecessary #includes, add #includes that are actually needed by
      some headers.
      442c105e
    • H
      Replace "uint" type with uint32 or unsigned int. · ce33af22
      Heikki Linnakangas 提交于
      "uint" is not a standard C type, so it might not be available on all
      platforms. Indeed, we had a typedef for WIN32 for that. But there's no reason
      to use "uint", might as well just use the C standard "unsigned int", or the
      PostgreSQL-specific uint32. Makes the intention more clear too, IMHO.
      ce33af22
  32. 09 2月, 2016 1 次提交
    • A
      Fix race condition while preparing transaction instead of serializing prepares. · 75b2d55d
      Ashwin Agrawal 提交于
      Removed the locking and some more cleanups.
      Avoid looping again in FinishPrepared Transaction. prepare_lsn to commit
      the transaction can be found using gxact which we have locked, seems
      pointless to loop around again to scan the list.
      
      This is modified patch for GPDB based on postgres patch:
      https://github.com/postgres/postgres/commit/bb38fb0d43c8d7ff54072bfd8bd63154e536b384#diff-3ed77c70e54e7f56eff48f6157aba91e
      Original Patch commit message:
      To lock a prepared transaction's shared memory entry, we used to mark it
      with the XID of the backend. When the XID was no longer active according
      to the proc array, the entry was implicitly considered as not locked
      anymore. However, when preparing a transaction, the backend's proc array
      entry was cleared before transfering the locks (and some other state) to
      the prepared transaction's dummy PGPROC entry, so there was a window where
      another backend could finish the transaction before it was in fact fully
      prepared.
      
      To fix, rewrite the locking mechanism of global transaction entries. Instead
      of an XID, just have simple locked-or-not flag in each entry (we store the
      locking backend's backend id rather than a simple boolean, but that's just
      for debugging purposes). The backend is responsible for explicitly unlocking
      the entry, and to make sure that that happens, install a callback to unlock
      it on abort or process exit.
      75b2d55d
  33. 28 10月, 2015 1 次提交