提交 · d1ba4da5ae27ffb7af80a464c0e3d62a5c540bca · Greenplum / Gpdb

15 7月, 2020 1 次提交

Cleanup idle reader gang after utility statements · d1ba4da5

由 Hubert Zhang 提交于 7月 15, 2020

Reader gangs use local snapshot to access catalog, as a result, it will
not synchronize with the sharedSnapshot from write gang which will
lead to inconsistent visibility of catalog table on idle reader gang.
Considering the case:

select * from t, t t1; -- create a reader gang.
begin;
create role r1;
set role r1;  -- set command will also dispatched to idle reader gang

When set role command dispatched to idle reader gang, reader gang
cannot see the new tuple t1 in catalog table pg_auth.
To fix this issue, we should drop the idle reader gangs after each
utility statement which may modify the catalog table.
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>

d1ba4da5

02 1月, 2020 1 次提交

Avoid multiple 'abort' WAL records · 9cad3967

由 Gang Xiong 提交于 12月 11, 2019

Things could go wrong between 'RecordTransactionAbort' and clear the
'CurrentTransactionState.transactionId', that leads to multiple 'abort' WAL
records. Putting 'rollbackDtxTransaction' in between increases the chance of
it. This patch add a check in 'RecordTransactionAbort': skip add another
'abort' WAL record if we are already in the middle of rollback,

9cad3967

13 11月, 2019 1 次提交

Flush error state before rethrowing error in dispatcher code. (#9027) · 143bb7c6

由 Paul Guo 提交于 11月 13, 2019

This prevents panic due to "ERRORDATA_STACK_SIZE exceeded" which was found
during testing.  The scenario happens when we test a transaction with multiple
cursors. When a QE postmaster is dead, those cursor gangs could error out in
SendChunkUDPIFC() -> checkExceptions(), causing QD loop in PostgresMain() ->
AbortCurrentTransaction() ->...-> mppExecutorFinishup() -> ReThrowError() ->
PostgresMain() -> AbortTransaction(). Note ReThrowError() would increase errordata_stack_depth by 1 so it could lead to error stack overflow (finally the database would PANIC to prevent real overflow).

Typical stack of one item in the errordata array is as below,

	stacktracearray = {[0] = 0xadb39d <errstart+1086>, [1] = 0xb93362 <cdbdisp_get_PQerror+289>, [2] = 0xb93197 <cdbdisp_dumpDispatchResult+87>,
	[3] = 0xb935f3 <cdbdisp_dumpDispatchResults+226>, [4] = 0xb8ffef <cdbdisp_getDispatchResults+169>, [5] = 0x7545cf <mppExecutorFinishup+235>,
	[6] = 0x736c77 <standard_ExecutorEnd+706>, [7] = 0x7369b2 <ExecutorEnd+54>, [8] = 0x6c4e28 <PortalCleanup+345>, [9] = 0xb2168b <AtAbort_Portals+214>,
	[10] = 0x53b996 <AbortTransaction+356>, [11] = 0x53c467 <AbortCurrentTransaction+214>, [12] = 0x9714d9 <PostgresMain+1728>, [13] = 0x8dfdf4 <ExitPostmaster>,
	[14] = 0x8df485 <BackendStartup+371>, [15] = 0x8db324 <ServerLoop+825>, [16] = 0x8da870 <PostmasterMain+4908>, [17] = 0x7d5467 <startup_hacks>,
	[18] = 0x7f5311c51c05 <__libc_start_main+245>, [19] = 0x48e039 <_start+41>, [20] = 0x0, [21] = 0x0, [22] = 0x0, [23] = 0x0, [24] = 0x0, [25] = 0x0, [26] = 0x0,
	[27] = 0x0, [28] = 0x0, [29] = 0x0},

In icg testing, typically running tests with portals.sql + killing one QE postmaster sometimes triggers this issue.

Fixing this by adding FlushErrorState() before ReThrowError(). Calling
FlushErrorState() before ReThrowError() is usual if the error was
copied in advance (e.g. via CopyErrorData()).

Also tweak error logging code a bit.

- errfinish_and_return() does not pfree some memory although so far no callers
set those variables.

- errfinish_and_return() should put context callback function calls in memory
context ErrorContext.

- Some callers of errstart() does not check return value. This is wrong though
currently our callers in the patch does not seem to suffer from this issue, but
it is a good habit to check.

Reviewed-by: Georgios Kokolatos
Reviewed-by: Asim R P

143bb7c6

28 10月, 2019 1 次提交

Fix various issues in Gang management (#8893) · 797065c5

由 Paul Guo 提交于 10月 28, 2019

1. Do not call elog(FATAL) in cleanupQE() since it could be called in
cdbdisp_destroyDispatcherState() to destroy CdbDispatcherState. This leads to
reentrance of cdbdisp_destroyDispatcherState() which is not supported. Changing
the code to return false instead and to sanity check the reentrance. Returning
false should be ok since that leads to gang destroying and thus QE resources
should be destroyed themselves. Here is a typical stack of reentrance.

0x0000000000b8ffeb in cdbdisp_destroyDispatcherState (ds=0x2eff168) at cdbdisp.c:345
0x0000000000b90385 in cleanup_dispatcher_handle (h=0x2eff0d8) at cdbdisp.c:488
0x0000000000b904c0 in cdbdisp_cleanupDispatcherHandle (owner=0x2e80de0) at cdbdisp.c:555
0x0000000000b27fb7 in CdbResourceOwnerWalker (owner=0x2e80de0, callback=0xb90479 <cdbdisp_cleanupDispatcherHandle>) at resowner.c:1375
0x0000000000b27fd8 in CdbResourceOwnerWalker (owner=0x2f30358, callback=0xb90479 <cdbdisp_cleanupDispatcherHandle>) at resowner.c:1379
0x0000000000b903d9 in AtAbort_DispatcherState () at cdbdisp.c:511
0x000000000053b8ab in AbortTransaction () at xact.c:3319
0x000000000053e057 in AbortOutOfAnyTransaction () at xact.c:5248
0x00000000005c6869 in RemoveTempRelationsCallback (code=1, arg=0) at namespace.c:4088
0x000000000093c193 in shmem_exit (code=1) at ipc.c:257
0x000000000093c088 in proc_exit_prepare (code=1) at ipc.c:214
0x000000000093bf86 in proc_exit (code=1) at ipc.c:104
0x0000000000adb6e2 in errfinish (dummy=0) at elog.c:754
0x0000000000ade465 in elog_finish (elevel=21, fmt=0xe847c0 "cleanup called when a segworker is still busy") at elog.c:1735
0x0000000000beca81 in cleanupQE (segdbDesc=0x2ee9048) at cdbutil.c:846
0x0000000000becbc8 in cdbcomponent_recycleIdleQE (segdbDesc=0x2ee9048, forceDestroy=0 '\000') at cdbutil.c:871
0x0000000000b9815a in RecycleGang (gp=0x2eff7f0, forceDestroy=0 '\000') at cdbgang.c:861
0x0000000000b9009e in cdbdisp_destroyDispatcherState (ds=0x2eff168) at cdbdisp.c:372
0x0000000000b96957 in CdbDispatchCopyStart (cdbCopy=0x2f23828, stmt=0x2e364d0, flags=5) at cdbdisp_query.c:1442

2. Force to drop the reader gang for named portal if set command happens
previously since that setting was not dispatched to that gang and thus we
should not reuse them.

3. Now that we have the mechanism of destroying DispatcherState in resource
owner callback when aborting transaction. It is not needed to destroy in some
dispatcher code.

The added test cases and some existing test cases cover almost all code change
except the change in cdbdisp_dispatchX() (I can not find a solution to test
this, and I'll keep it in my mind to see how to test that or similar code).

Reviewed-by: Pengzhou Tang
Reviewed-by: Asim R P

797065c5

22 7月, 2019 1 次提交

Keep the order of reusing idle gangs · 51a7ea27

由 Ning Yu 提交于 7月 16, 2019

For example:
In the same session,
query 1 has 3 slices and it creates gang 1, gang 2 and gang 3.
query 2 has 2 slices, we hope it reuses gang 1 and gang 2 instead of other
cases like gang 3 and gang 2.

In this way, the two queries can have the same send-receive port pair. It's
useful in platform like Azure. Because Azure limits the number of different
send-receive port pairs (AKA flow) in a certain time period.
Co-authored-by: NHubert Zhang <hzhang@pivotal.io>
Co-authored-by: NPaul Guo <pguo@pivotal.io>
Co-authored-by: NNing Yu <nyu@pivotal.io>

51a7ea27

28 12月, 2018 1 次提交
- H
  
  Remove misc unnecessary #includes. · 3c3b35c7
  由 Heikki Linnakangas 提交于 12月 28, 2018
  
  3c3b35c7
14 12月, 2018 1 次提交
- H
  Remove misc unused code. · aa4466ca
  由 Heikki Linnakangas 提交于 12月 14, 2018
```
I don't know what all of these were used for originally, but it's dead
code now.
```
  aa4466ca
13 12月, 2018 1 次提交
- H
  Silence warnings about unused variables with assertions disabled. · f844e5f6
  由 Heikki Linnakangas 提交于 12月 12, 2018
```
These were only used by Assert()s.
```
  f844e5f6
14 11月, 2018 1 次提交

Fix dispatching of mixed two-phase and one-phase queries. · 9db30681

由 Heikki Linnakangas 提交于 11月 14, 2018

Commit 576690f2 added tracking of which segments have been enlisted in
a distributed transaction. However, it registered every query dispatched
with CdbDispatch*() in the distributed transaction, even if the query was
dispatched without the DF_NEED_TWO_PHASE flag to the segments. Without
DF_NEED_TWO_PHASE, the QE will believe it's not part of a distributed
transaction, and will throw an error when the QD tries to prepare it for
commit:

ERROR: Distributed transaction 1542107193-0000000232 not found (cdbtm.c:3031)

This can occur if a command updates a partially distributed table (a table
with gp_distribution_policy.numsegments smaller than the cluster size),
and uses one of the backend functions, like pg_relation_size(), that
dispatches an internal query to all segments.

Fix the confusion, by only registering commands that are dispatched with
the DF_NEED_TWO_PHASE flag in the distributed transaction.
Reviewed-by: NPengzhou Tang <ptang@pivotal.io>

9db30681

07 11月, 2018 1 次提交

Adjust GANG size according to numsegments · 6dd2759a

由 ZhangJackey 提交于 11月 07, 2018

Now we have  partial tables and flexible GANG API, so we can allocate
GANG according to numsegments.

With the commit 4eb65a53, GPDB supports table distributed on partial segments,
and with the series of commits (a3ddac06, 576690f2), GPDB supports flexible
gang API. Now it is a good time to combine both the new features. The goal is
that creating gang only on the necessary segments for each slice. This commit
also improves singleQE gang scheduling and does some code clean work. However,
if ORCA is enabled, the behavior is just like before.

The outline of this commit is:

  * Modify the FillSliceGangInfo API so that gang_size is truly flexible.
  * Remove numOutputSegs and outputSegIdx fields in motion node. Add a new
     field isBroadcast to mark if the motion is a broadcast motion.
  * Remove the global variable gp_singleton_segindex and make singleQE
     segment_id randomly(by gp_sess_id).
  * Remove the field numGangMembersToBeActive in Slice because it is now
     exactly slice->gangsize.
  * Modify the message printed if the GUC Test_print_direct_dispatch_info
     is set.
  * Explicitly BEGIN create a full gang now.
  * format and remove destSegIndex
  * The isReshuffle flag in ModifyTable is useless, because it only is used
     when we want to insert tuple to the segment which is out the range of
     the numsegments.

Co-authored-by: Zhenghua Lyu zlv@pivotal.io

6dd2759a

29 10月, 2018 1 次提交

Simplify direct dispatch related code (#6080) · 576690f2

由 Tang Pengzhou 提交于 10月 29, 2018

* Simplify direct dispatch related code

This commit include two parts:
* simplify direct-dispatch dispatching code
* simplify direct-dispatch DTM related code

Previously, cdbdisp_dispatchToGang need a CdbDispatchDirectDesc info,
now gang only contain inuse segments, so direct-dispatch info is useless.

Another thing is, we need to decide if DTM is available for direct-dispatch
within dtmPreCommand, the logic is complex, you need to know if the main plan
is direct-dispatch and if the init plan contain direct-dispatch.

one example is:
"update foo set foo.c2 = 2
where foo.c1 = 1 and exists (select * from bar where bar.c1=4)"

main plan can be direct dispatched to segment 1, init plan can be direct
dispatched to segment 2, with the old logic, the DTM like PREPARE need to
dispatched to all segments, so dtmPreCommand need to dispatch a DTM named
'DTX_PROTOCOL_COMMAND_STAY_AT_OR_BECOME_IMPLIED' to all segment so those
segments like segment 3 who didn't receive the plan can be ready for two
phase commit.

With the new gang API, we can simplify this process, we add a list in
currentGxact to record which segments are actually get involved in a two
phase commit, then we can dispatch DTM to them directly. This is also very
usefully for queries on tables that are not fully expaned yet.

* support direct dispatch to more than one segment

576690f2

25 10月, 2018 1 次提交

Unify the way to fetch/manage the number of segments (#6034) · 8eed4217

由 Tang Pengzhou 提交于 10月 25, 2018

* Don't use GpIdentity.numsegments directly for the number of segments

Use getgpsegmentCount() instead.

* Unify the way to fetch/manage the number of segments

Commit e0b06678 lets us expanding a GPDB cluster without a restart,
the number of segments may changes during a transaction now, so we
need to take care of the numsegments.

We now have two way to get segments number, 1) from GpIdentity.numsegments
2) from gp_segment_configuration (cdb_component_dbs) which dispatcher used
to decide the segments range of dispatching. We did some hard work to
update GpIdentity.numsegments correctly within e0b06678 which made the
management of segments more complicated, now we want to use an easier way
to do it:

1. We only allow getting segments info (include number of segments) through
gp_segment_configuration, gp_segment_configuration has newest segments info,
there is no need to update GpIdentity.numsegments, GpIdentity.numsegments is
left only for debugging and can be removed totally in the future.

2. Each global transaction fetches/updates the newest snapshot of
gp_segment_configuration and never change it until the end of transaction
unless a writer gang is lost, so a global transaction can see consistent
state of segments. We used to use gxidDispatched to do the same thing, now
it can be removed.

* Remove GpIdentity.numsegments

GpIdentity.numsegments take no effect now, remove it. This commit
does not remove gp_num_contents_in_cluster because it needs to
modify utilities like gpstart, gpstop, gprecoverseg etc, let's
do such cleanup work in another PR.

* Exchange the default UP/DOWN value in fts cache

Previously, Fts prober read gp_segment_configuration, checked the
status of segments and then set the status of segments in the shared
memory struct named ftsProbeInfo->fts_status[], so other components
(mainly used by dispatcher) can detect a segment was down.

All segments were initialized as down and then be updated to up in
most common cases, this brings two problems:

1. The fts_status is invalid until FTS does the first loop, so QD
need to check ftsProbeInfo->fts_statusVersion > 0
2. gpexpand add a new segment in gp_segment_configuration, the
new added segment may be marked as DOWN if FTS doesn't scan it
yet.

This commit changes the default value from DOWN to UP which can
resolve problems mentioned above.

* Fts should not be used to notify backends that a gpexpand occurs

As Ashwin mentioned in PR#5679, "I don't think giving FTS responsibility to
provide new segment count is right. FTS should only be responsible for HA
of the segments. The dispatcher should independently figure out the count
based on catalog.gp_segment_configuration should be the only way to get
the segment count", FTS should decouple from gpexpand.

* Access gp_segment_configuration inside a transaction

* upgrade log level from ERROR to FATAL if expand version changed

* Modify gpexpand test cases according to new design

8eed4217

15 10月, 2018 1 次提交

Retire threaded dispatcher · 87394a7b

由 Ning Yu 提交于 10月 15, 2018

Now there is only the async dispatcher.  The dispatcher API interface is
kept so we might add new backend in the future.

The GUC gp_connections_per_thread is also retired which was used to
switch between the async and threaded backends.

87394a7b

27 9月, 2018 1 次提交

Dispatcher can create flexible size gang (#5701) · a3ddac06

由 Tang Pengzhou 提交于 9月 27, 2018

* change type of db_descriptors to SegmentDatabaseDescriptor **

A new gang definination may consist of cached segdbDesc and new
created segdbDesc, there is no need to palloc all segdbDesc struct
as new.

* Remove unnecessary allocate gang unit test

* Manage idle segment dbs using CdbComponentDatabases instead of available* lists.

To support vary size gang, we now need to manage segment dbs in a lower
granularity, previously, idle QEs is managed by a bunch of lists like
availablePrimaryWriterGang, availableReaderGangsN, this restrict
dispatcher to only create N-size (N = number of segments) or 1-size
gang.

CdbComponentDatabases is a snapshot of segment components within current
cluster, now it maintains a freelist for each segment component. When
creating gang, dispatcher will make up a gang from each segment
component (from freelist or create a new segment db). When cleaning up
a gang, dispatcher will return idle segment dbs to each segment
component.

CdbComponentDatabases provide a few functions to manipulate segment dbs
(SegmentDatabaseDescriptor *):
* cdbcomponent_getCdbComponents
* cdbcomponent_destroyCdbComponents
* cdbcomponent_allocateIdleSegdb
* cdbcomponent_recycleIdleSegdb
* cdbcomponent_cleanupIdleSegdbs

CdbComponentDatabases is also FTS version sensitive, so once a FTS
version changed, CdbComponentDatabases destroy all idle segment dbs
and allocate QEs in the new promoted segment. This provides the ability
to transparent mirror failover to users.

Since segment dbs(SegmentDatabaseDescriptor *) are managed by
CdbComponentDatabases now, we can simplify the memory context
management by replacing GangContext & perGangContext with
DispatcherContext & CdbComponentsContext.

* Postpone the error hanlding when creating gang

Now we have AtAbort_DispatcherState, one advantage of it is that
we can postpone gang error hanlding in this function and make
code cleaner.

* Handle FTS version change correctly

In some cases, when a FTS version changed, we can't update current
snapshot of segment components, to be more specifically, we can't
destroy current writer segment dbs and create new segment dbs.

These cases include:
* session has temp table created.
* query need two-phase commit and gxid has been dispatched to
  segments.

* Replace <gangId, sliceId> map with <qeIdentifier, sliceId> map

We used to dispatch a <gangId, sliceId> map along with query to
segment dbs so segment dbs can know which slice they should
execute.

Now gangId is useless for a segment db because a segment db can
be reused by different gang, so we need a new way to tell the
info to segment dbs. To resolve this, CdbComponentDatabases
assign a unique identifier to each segment db and make up a
bitmap set which consist of segment identifiers for each slice,
segment dbs then can go through the slice table and find the
right slice to execute.

* Allow dispatcher to create vary size gang and refine AssignGangs()

Previously, dispatcher can only create N-size gang for
GANGTYPE_PRIMARY_WRITER or GANGTYPE_PRIMARY_READER. this
restrict dispatcher in many ways, one example is direct
dispatch, it always create a N-size gang even it only
dispatch the command to one segment, another example is
some operations may be able to use N+ size gang, like
hash join, if both inner and outer plan is redistributed,
the hash join node can associate with a N+ size gang to
execute. This commit changes the API of createGang() so the
caller can specify a list of segments (partial or even
duplicate segments), CdbCompoentDatabase will guarantee
each segment has only one writer in a session. With this
it also resolves another pain point of AssignGangs(), so
the caller don't need to promote a GANGTYPE_PRIMARY_READER
to GANGTYPE_PRIMARY_WRITER, or promote a GANGTYPE_SINGLETON
_READER to GANGTYPE_PRIMARY_WRITER for replicated table
(see FinalizeSliceTree()).

With this commit, AssignGang() is very clear now.

a3ddac06

06 9月, 2018 1 次提交

Integrate Gang management from portal to Dispatcher and simplify AssignGangs for init plans (#5555) · 78a4890a

由 Tang Pengzhou 提交于 9月 06, 2018

* Simplify the AssignGangs() logic for init plans

Previously, AssignGangs() assign gangs for both main plans and
init plans in one shot. Because init plans and main plan are
executed sequentially, so the gangs can be reused between main
plan and init plans, function AccumSliceReq() is designed for
this.

This process can be simplified: already know the root slice
index id will be adjusted to according init plan id, init plan
only need to assign their own slices.

* Integrate Gang management from portal to Dispatcher

Previously, Gang was managed by portal, freeGangsForPortal()
was used to cleanup gang resource, DTM related commands also
needed a gang to dispatch command outside of a portal and
used freeGangsForPortal() too. There might be multiple
command/plan/utility executed within one portal, all commands
relied on a dispatcher routine like CdbDispatchCommand /
CdbDispatchPlan/CdbDispatchUtility... to dispatch, gangs were
created by each dispatcher routines, but not be recycled or
destroyed when a routine finished except for primary writer
gang, one defect of this is gang resource cannot be reused
between dispatcher routines. GPDB already had an optimization
for init plans, if a plan contained init plans, AssignGangs
was called before execution of any of them it went through
the whole slice tree and created the maximum gang that both
main plan and init plans needed, this was doable because init
plans and main plan were executed sequentially, but it also
made AssignGangs logic complex, meanwhile, reusing an not
clean gang was not safe.

Another confusing thing was the gang and dispatcher were
managed separately which cause context inconsistent like:
when a dispatcher state was destroyed, gang was not recycled,
when a gang was destroyed by portal, the dispatcher state was
still in use and may refer to the context of a destroyed gang.

As described above, this commit integrates gang management
with dispatcher, a dispather state is responsible for creating
and tracking gangs as needed and destroy them when dispatcher
state is destroyed.

* Handle the case when primary writer gang has gone

When members of primary writer gang gone, the writer gang
is destroyed immediately (primaryWriterGang is set to NULL)
when a dispatcher rountine (eg.CdbDispatchCommand) finished.
So when dispatching two-phase-DTM/DTX related command, QD
doesn't know writer gang has gone, it may get unexpected
error like 'savepoint not exist', 'subtransaction level not
match', 'temp file not exist'.

Previously, primaryWriterGang is not reset when DTM/DTX
commands start even it is pointing to invalid segments, so
those DTM/DTX commands will not actually sent to segments,
an normal error reported on QD looks like 'could not
connect to segment: initialization of segworker'.

So we need a way to info global transaction that its writer
gang has lost. so when aborting transaction, QD can:
1. disconnect all reader gangs, this is usefull to skip
dispatching "ABORT_NO_PREPARE"
2. reset session and drop temp files because temp files in
segment is gone.
3. report a error when dispatching "rollback savepoint" DTX
because savepoint in segment is gone.
4. report a error when dispatch "abort subtransaction" DTX
because subtransaction is rollback when writer segment is down.

78a4890a

14 8月, 2018 2 次提交

Remove cdbdisp_finishCommand · 957629d1

由 Pengzhou Tang 提交于 7月 31, 2018

Previously, cdbdisp_finishCommand did three things:
1. cdbdisp_checkDispatchResult
2. cdbdisp_getDispatchResult
3. cdbdisp_destroyDispatcherState

However, cdbdisp_finishCommand didn't make code cleaner or more
convenient to use, in contrast, it makes error handling more
difficult and makes code more complicated and inconsistent.

This commit also reset estate->dispatcherState to NULL to avoid
re-entry of cdbdisp_* functions.

957629d1

P
Rename CdbCheckDispatchResult for name convention · 60bd3ab2
由 Pengzhou Tang 提交于 7月 31, 2018
```
Use cdbdisp_checkDispatchResult instead of CdbCheckDispatchResult
to be consistent of cdbdisp_* functions.
```
60bd3ab2

09 5月, 2018 1 次提交
- X
  Refactor interconnect and dispatcher resource cleanup · b0353e0a
  由 xiong-gang 提交于 5月 09, 2018
```
Use resource owner to do the cleanup of dispatcher and interconnect(#4761)
```
  b0353e0a
01 3月, 2018 1 次提交

Give a better error message, if preparing an xact fails. · b3c50e40

由 Heikki Linnakangas 提交于 3月 01, 2018

If an error happens in the prepare phase of two-phase commit, relay the
original error back to the client, instead of the fairly opaque
"Abort [Prepared]' broadcast failed to one or more segments" message you
got previously. A lot of things happen during the prepare phase that
can legitimately fail, like checking deferred constraints, like in the
'constraints' regression test. But even without that, there can be
triggers, ON COMMIT actions, etc., any of which can fail.

This commit consists of several parts:

* Pass 'true' for the 'raiseError' argument when dispatching the prepare
  dtx command in doPrepareTransaction(), so that the error is emitted to
  the client.

* Bubble up an ErrorData struct, with as many fields intact as possible,
  to the caller,  when dispatching a dtx command. (Instead of constructing
  a message in a StringInfo). So that we can re-throw the message to
  the client, with its original formatting.

* Don't throw an error in performDtxProtocolCommand(), if we try to abort
  a prepared transaction that doesn't exist. That is business-as-usual,
  if a transaction throws an error before finishing the prepare phase.

* Suppress the "NOTICE: Releasing segworker groups to retry broadcast."
  message, when aborting a prepared transaction.

Put together, the effect is if an error happens during prepare phase, the
client receives a message that is largely indistinguishable from the
message you'd get if the same failure happened while running a normal
statement.

Fixes github issue #4530.

b3c50e40

25 1月, 2018 1 次提交

Propagate segment errcodes to dispatcher · 58003bc7

由 Daniel Gustafsson 提交于 1月 25, 2018

The errcode thrown in an ereport() on a segment was passed back to
the dispatcher, but then dropped and replaced with a default errcode
of ERRCODE_DATA_EXCEPTION. This works for most situations, but when
trapping errors the exact errcode must be propagated. This extends
the API to extract the errcode as well. The below case illustrates
the previous issue:

  CREATE TABLE test1(id int primary key);
  CREATE TABLE test2(id int primary key);
  INSERT INTO test1 VALUES(1);
  INSERT INTO test2 VALUES(1);
  CREATE OR REPLACE FUNCTION merge_table() RETURNS void AS $$
  DECLARE
	v_insert_sql varchar;
  BEGIN
	v_insert_sql :='INSERT INTO test1 SELECT * FROM test2';
	EXECUTE v_insert_sql;
	EXCEPTION WHEN unique_violation THEN
		RAISE NOTICE 'unique_violation';
	END;
  $$ LANGUAGE plpgsql volatile;
  SELECT merge_table();

58003bc7

02 11月, 2017 1 次提交

Wake up faster, if a segment returns an error. · 3bbedbe9

由 Heikki Linnakangas 提交于 11月 02, 2017

Previously, if a segment reported an error after starting up the
interconnect, it would take up to 250 ms for the main thread in the QD
process to wake up and poll the dispatcher connections, and to see that
there was an error. Shorten that time, by waking up immediately if the
QD->QE libpq socket becomes readable while we're waiting for data to
arrive in a Motion node.

This isn't a complete solution, because this will only wake up if one
arbitrarily chosen connection becomes readable, and we still rely on
polling for the others. But this greatly speeds up many common scenarios.
In particular, the "qp_functions_in_select" test now runs in under 5 s
on my laptop, when it took about 60 seconds before.

3bbedbe9

30 10月, 2017 1 次提交
- A
  Retire gp_libpq_fe part 2, changing including path · 974c414e
  由 Adam Lee 提交于 10月 23, 2017
```
Signed-off-by: NAdam Lee <ali@pivotal.io>
```
  974c414e
10 10月, 2017 1 次提交
- A
  
  pgindent cdb/dispatcher directory. · a75898c3
  由 Ashwin Agrawal 提交于 9月 28, 2017
  
  a75898c3
01 9月, 2017 1 次提交

Fix Copyright and file headers across the tree · ed7414ee

由 Daniel Gustafsson 提交于 9月 01, 2017

This bumps the copyright years to the appropriate years after not
having been updated for some time. Also reformats existing code
headers to match the upstream style to ensure consistency.

ed7414ee

09 8月, 2017 1 次提交

Do not include gp-libpq-fe.h and gp-libpq-int.h in cdbconn.h · cf7cddf7

由 Pengzhou Tang 提交于 8月 07, 2017

The whole cdb directory was shipped to end users and all header files
that cdb*.h included are also need to be shipped to make checkinc.py
pass. However, exposing gp_libpq_fe/*.h will confuse customer because
they are almost the same as libpq/*, as Heikki's suggestion, we should
keep gp_libpq_fe/* unchanged. So to make system work, we include
gp-libpg-fe.h and gp-libpq-int.h directly in c files that need them

cf7cddf7

08 8月, 2017 1 次提交

Remove unnecessary use of PQExpBuffer. · cc38f526

由 Heikki Linnakangas 提交于 8月 08, 2017

StringInfo is more appropriate in backend code. (Unless the buffer needs to
be used in a thread.)

In the passing, rename the 'conn' static variable in cdbfilerepconnclient.c.
It seemed overly generic.

cc38f526

18 11月, 2016 2 次提交

Use proper error code for errors. · 0bf31cd6

由 Heikki Linnakangas 提交于 11月 17, 2016

Attach a suitable error code for many errors that were previously reported
as "internal errors". GPDB's elog.c prints a source file name and line
number for any internal errors, which is a bit ugly for errors that are
in fact unexpected internal errors, but user-facing errors that happen
as a result of e.g. an invalid query.

To make sure we don't accumulate more of these, adjust the regression tests
to not ignore the source file and line number in error messages. There are
a few exceptions, which are listed explicitly.

0bf31cd6

Remove errOmitLocation. · f6f5c9ef

由 Heikki Linnakangas 提交于 11月 17, 2016

It was somewhat broken. The order of evaluation of function arguments is
implementation-specific, so in a statement like:

ereport(ERROR,
        (errcode(ERRCODE_INTERNAL_ERROR),
	 errOmitLocation(true)));

We cannot assume that errOmitLocation() is evaluated after errcode(). If
errOmitLocation() is evaluated first, the errcode() call could overwrite
the omit_location field, seeing that the error code was "internal error".

Almost all of the errOmitLocation calls in the codebase were superfluous
anyway. The default logic is to omit the location for anything else than
ERRCODE_INTERNAL error, so it is not necessary to call errOmitLocation(true),
if errcode (other than ERRCODE_INTERNAL_ERROR) is given. Likewise
errOmitLocation(false) is not needed for internal errors.

Remove the whole errOmitLocation() function. It's not really needed. The
most notable callsite where it mattered was in cdbdisp.c, but that one
was broken by the order-of-evaluation issue. Use a different error code
there. What we really should do there is to pass the error code from the
segment back to the client, but I'll leave that for another day.

f6f5c9ef

14 11月, 2016 1 次提交

Use nonblocking mechanism to send data in async dispatcher. · 2516eac6

由 xiong-gang 提交于 11月 14, 2016

pqFlush is sending data synchronously though the socket is set
O_NONBLOCK, this incurs performance downgradation. This commit uses
pqFlushNonBlocking instead, and synchronizes the completion of
dispatching to all Gangs before query execution.

Signed-off-by: Kenan Yao<kyao@pivotal.io>

2516eac6

04 11月, 2016 1 次提交
- X
  Rename the interface routines of dispatcher, and add a README for illustration · be13fd00
  由 xiong-gang 提交于 11月 04, 2016
```
Signed-off-by: NKenan Yao <kyao@pivotal.io>
```
  be13fd00
29 8月, 2016 1 次提交

Fix few dispatch related bugs · eb40e073

由 Pengzhou Tang 提交于 8月 04, 2016

1.Fix primary writer gang leak: accidentally set PrimaryWriterGang to NULL which cause disconnectAndDestroyAllGangs()
can not destroy primary writer gang.
2.Fix gang leak: when creating gang, if retry count exceed the limitation, forget to destroy the failed gang.
3.Remove duplicate sanity check before dispatchCommand().
4.Remove unnecessary error-out when a broken Gang is no longer needed.
5.Fix thread leak problem
6.Enhance error handling for cdbdisp_finishCommand

eb40e073

25 7月, 2016 1 次提交

Refactor utility statement dispatch interfaces · 01769ada

由 Pengzhou Tang 提交于 7月 08, 2016

refactor CdbDispatchUtilityStatement() to make it flexible for cdbCopyStart(),
dispatchVacuum() to call directly. Introduce flags like DF_NEED_TWO_SNAPSHOT,
DF_WITH_SNAPSHOT, DF_CANCEL_ON_ERROR to make function call much clearer

01769ada

17 7月, 2016 3 次提交
- G
  
  Add asynchronous implementation of creating gang · 5c252859
  由 Gang Xiong 提交于 6月 23, 2016
  
  5c252859
- G
  
  Add asynchronous implementation of dispatcher · c0fa7236
  由 Gang Xiong 提交于 6月 23, 2016
  
  c0fa7236
- G
  
  Refactor interface between cdbdisp.c and cdbdisp_thread.c · 5cc587ff
  由 Gang Xiong 提交于 6月 22, 2016
  
  5cc587ff
28 6月, 2016 1 次提交
- K
  
  Miscellaneous code format and comments change, mainly in dispatcher. · ec84e19a
  由 Kenan Yao 提交于 6月 27, 2016
  
  ec84e19a
22 6月, 2016 1 次提交
- G
  
  Remove unnecessary argument of cdbdisp_makeDispatcherState · a2ecd1fa
  由 Gang Xiong 提交于 6月 22, 2016
  
  a2ecd1fa
20 6月, 2016 2 次提交
- G
  
  Remove dead code · 9b7fea9b
  由 Gang Xiong 提交于 6月 20, 2016
  
  9b7fea9b
- G
  
  Remove dead code · 5d408fc1
  由 Gang Xiong 提交于 6月 13, 2016
  
  5d408fc1
19 5月, 2016 1 次提交

Split cdbdisp.c into several files, and put them into a new · 895b7d50

由 Pengzhou Tang 提交于 5月 12, 2016

dispatcher/ directory

This commit has no logic change, it just contains movement of code across
files, to make dispatcher code clearer, and easier for unit testing.

Signed-off-by: Kenan Yao

895b7d50