- 15 7月, 2020 1 次提交
-
-
由 Hubert Zhang 提交于
Reader gangs use local snapshot to access catalog, as a result, it will not synchronize with the sharedSnapshot from write gang which will lead to inconsistent visibility of catalog table on idle reader gang. Considering the case: select * from t, t t1; -- create a reader gang. begin; create role r1; set role r1; -- set command will also dispatched to idle reader gang When set role command dispatched to idle reader gang, reader gang cannot see the new tuple t1 in catalog table pg_auth. To fix this issue, we should drop the idle reader gangs after each utility statement which may modify the catalog table. Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
-
- 28 10月, 2019 1 次提交
-
-
由 Paul Guo 提交于
1. Do not call elog(FATAL) in cleanupQE() since it could be called in cdbdisp_destroyDispatcherState() to destroy CdbDispatcherState. This leads to reentrance of cdbdisp_destroyDispatcherState() which is not supported. Changing the code to return false instead and to sanity check the reentrance. Returning false should be ok since that leads to gang destroying and thus QE resources should be destroyed themselves. Here is a typical stack of reentrance. 0x0000000000b8ffeb in cdbdisp_destroyDispatcherState (ds=0x2eff168) at cdbdisp.c:345 0x0000000000b90385 in cleanup_dispatcher_handle (h=0x2eff0d8) at cdbdisp.c:488 0x0000000000b904c0 in cdbdisp_cleanupDispatcherHandle (owner=0x2e80de0) at cdbdisp.c:555 0x0000000000b27fb7 in CdbResourceOwnerWalker (owner=0x2e80de0, callback=0xb90479 <cdbdisp_cleanupDispatcherHandle>) at resowner.c:1375 0x0000000000b27fd8 in CdbResourceOwnerWalker (owner=0x2f30358, callback=0xb90479 <cdbdisp_cleanupDispatcherHandle>) at resowner.c:1379 0x0000000000b903d9 in AtAbort_DispatcherState () at cdbdisp.c:511 0x000000000053b8ab in AbortTransaction () at xact.c:3319 0x000000000053e057 in AbortOutOfAnyTransaction () at xact.c:5248 0x00000000005c6869 in RemoveTempRelationsCallback (code=1, arg=0) at namespace.c:4088 0x000000000093c193 in shmem_exit (code=1) at ipc.c:257 0x000000000093c088 in proc_exit_prepare (code=1) at ipc.c:214 0x000000000093bf86 in proc_exit (code=1) at ipc.c:104 0x0000000000adb6e2 in errfinish (dummy=0) at elog.c:754 0x0000000000ade465 in elog_finish (elevel=21, fmt=0xe847c0 "cleanup called when a segworker is still busy") at elog.c:1735 0x0000000000beca81 in cleanupQE (segdbDesc=0x2ee9048) at cdbutil.c:846 0x0000000000becbc8 in cdbcomponent_recycleIdleQE (segdbDesc=0x2ee9048, forceDestroy=0 '\000') at cdbutil.c:871 0x0000000000b9815a in RecycleGang (gp=0x2eff7f0, forceDestroy=0 '\000') at cdbgang.c:861 0x0000000000b9009e in cdbdisp_destroyDispatcherState (ds=0x2eff168) at cdbdisp.c:372 0x0000000000b96957 in CdbDispatchCopyStart (cdbCopy=0x2f23828, stmt=0x2e364d0, flags=5) at cdbdisp_query.c:1442 2. Force to drop the reader gang for named portal if set command happens previously since that setting was not dispatched to that gang and thus we should not reuse them. 3. Now that we have the mechanism of destroying DispatcherState in resource owner callback when aborting transaction. It is not needed to destroy in some dispatcher code. The added test cases and some existing test cases cover almost all code change except the change in cdbdisp_dispatchX() (I can not find a solution to test this, and I'll keep it in my mind to see how to test that or similar code). Reviewed-by: Pengzhou Tang Reviewed-by: Asim R P
-
- 14 12月, 2018 2 次提交
-
-
由 Heikki Linnakangas 提交于
I don't know what all of these were used for originally, but it's dead code now.
-
由 Heikki Linnakangas 提交于
-
- 29 10月, 2018 1 次提交
-
-
由 Tang Pengzhou 提交于
* Simplify direct dispatch related code This commit include two parts: * simplify direct-dispatch dispatching code * simplify direct-dispatch DTM related code Previously, cdbdisp_dispatchToGang need a CdbDispatchDirectDesc info, now gang only contain inuse segments, so direct-dispatch info is useless. Another thing is, we need to decide if DTM is available for direct-dispatch within dtmPreCommand, the logic is complex, you need to know if the main plan is direct-dispatch and if the init plan contain direct-dispatch. one example is: "update foo set foo.c2 = 2 where foo.c1 = 1 and exists (select * from bar where bar.c1=4)" main plan can be direct dispatched to segment 1, init plan can be direct dispatched to segment 2, with the old logic, the DTM like PREPARE need to dispatched to all segments, so dtmPreCommand need to dispatch a DTM named 'DTX_PROTOCOL_COMMAND_STAY_AT_OR_BECOME_IMPLIED' to all segment so those segments like segment 3 who didn't receive the plan can be ready for two phase commit. With the new gang API, we can simplify this process, we add a list in currentGxact to record which segments are actually get involved in a two phase commit, then we can dispatch DTM to them directly. This is also very usefully for queries on tables that are not fully expaned yet. * support direct dispatch to more than one segment
-
- 15 10月, 2018 1 次提交
-
-
由 Ning Yu 提交于
Now there is only the async dispatcher. The dispatcher API interface is kept so we might add new backend in the future. The GUC gp_connections_per_thread is also retired which was used to switch between the async and threaded backends.
-
- 27 9月, 2018 1 次提交
-
-
由 Tang Pengzhou 提交于
* change type of db_descriptors to SegmentDatabaseDescriptor ** A new gang definination may consist of cached segdbDesc and new created segdbDesc, there is no need to palloc all segdbDesc struct as new. * Remove unnecessary allocate gang unit test * Manage idle segment dbs using CdbComponentDatabases instead of available* lists. To support vary size gang, we now need to manage segment dbs in a lower granularity, previously, idle QEs is managed by a bunch of lists like availablePrimaryWriterGang, availableReaderGangsN, this restrict dispatcher to only create N-size (N = number of segments) or 1-size gang. CdbComponentDatabases is a snapshot of segment components within current cluster, now it maintains a freelist for each segment component. When creating gang, dispatcher will make up a gang from each segment component (from freelist or create a new segment db). When cleaning up a gang, dispatcher will return idle segment dbs to each segment component. CdbComponentDatabases provide a few functions to manipulate segment dbs (SegmentDatabaseDescriptor *): * cdbcomponent_getCdbComponents * cdbcomponent_destroyCdbComponents * cdbcomponent_allocateIdleSegdb * cdbcomponent_recycleIdleSegdb * cdbcomponent_cleanupIdleSegdbs CdbComponentDatabases is also FTS version sensitive, so once a FTS version changed, CdbComponentDatabases destroy all idle segment dbs and allocate QEs in the new promoted segment. This provides the ability to transparent mirror failover to users. Since segment dbs(SegmentDatabaseDescriptor *) are managed by CdbComponentDatabases now, we can simplify the memory context management by replacing GangContext & perGangContext with DispatcherContext & CdbComponentsContext. * Postpone the error hanlding when creating gang Now we have AtAbort_DispatcherState, one advantage of it is that we can postpone gang error hanlding in this function and make code cleaner. * Handle FTS version change correctly In some cases, when a FTS version changed, we can't update current snapshot of segment components, to be more specifically, we can't destroy current writer segment dbs and create new segment dbs. These cases include: * session has temp table created. * query need two-phase commit and gxid has been dispatched to segments. * Replace <gangId, sliceId> map with <qeIdentifier, sliceId> map We used to dispatch a <gangId, sliceId> map along with query to segment dbs so segment dbs can know which slice they should execute. Now gangId is useless for a segment db because a segment db can be reused by different gang, so we need a new way to tell the info to segment dbs. To resolve this, CdbComponentDatabases assign a unique identifier to each segment db and make up a bitmap set which consist of segment identifiers for each slice, segment dbs then can go through the slice table and find the right slice to execute. * Allow dispatcher to create vary size gang and refine AssignGangs() Previously, dispatcher can only create N-size gang for GANGTYPE_PRIMARY_WRITER or GANGTYPE_PRIMARY_READER. this restrict dispatcher in many ways, one example is direct dispatch, it always create a N-size gang even it only dispatch the command to one segment, another example is some operations may be able to use N+ size gang, like hash join, if both inner and outer plan is redistributed, the hash join node can associate with a N+ size gang to execute. This commit changes the API of createGang() so the caller can specify a list of segments (partial or even duplicate segments), CdbCompoentDatabase will guarantee each segment has only one writer in a session. With this it also resolves another pain point of AssignGangs(), so the caller don't need to promote a GANGTYPE_PRIMARY_READER to GANGTYPE_PRIMARY_WRITER, or promote a GANGTYPE_SINGLETON _READER to GANGTYPE_PRIMARY_WRITER for replicated table (see FinalizeSliceTree()). With this commit, AssignGang() is very clear now.
-
- 06 9月, 2018 1 次提交
-
-
由 Tang Pengzhou 提交于
* Simplify the AssignGangs() logic for init plans Previously, AssignGangs() assign gangs for both main plans and init plans in one shot. Because init plans and main plan are executed sequentially, so the gangs can be reused between main plan and init plans, function AccumSliceReq() is designed for this. This process can be simplified: already know the root slice index id will be adjusted to according init plan id, init plan only need to assign their own slices. * Integrate Gang management from portal to Dispatcher Previously, Gang was managed by portal, freeGangsForPortal() was used to cleanup gang resource, DTM related commands also needed a gang to dispatch command outside of a portal and used freeGangsForPortal() too. There might be multiple command/plan/utility executed within one portal, all commands relied on a dispatcher routine like CdbDispatchCommand / CdbDispatchPlan/CdbDispatchUtility... to dispatch, gangs were created by each dispatcher routines, but not be recycled or destroyed when a routine finished except for primary writer gang, one defect of this is gang resource cannot be reused between dispatcher routines. GPDB already had an optimization for init plans, if a plan contained init plans, AssignGangs was called before execution of any of them it went through the whole slice tree and created the maximum gang that both main plan and init plans needed, this was doable because init plans and main plan were executed sequentially, but it also made AssignGangs logic complex, meanwhile, reusing an not clean gang was not safe. Another confusing thing was the gang and dispatcher were managed separately which cause context inconsistent like: when a dispatcher state was destroyed, gang was not recycled, when a gang was destroyed by portal, the dispatcher state was still in use and may refer to the context of a destroyed gang. As described above, this commit integrates gang management with dispatcher, a dispather state is responsible for creating and tracking gangs as needed and destroy them when dispatcher state is destroyed. * Handle the case when primary writer gang has gone When members of primary writer gang gone, the writer gang is destroyed immediately (primaryWriterGang is set to NULL) when a dispatcher rountine (eg.CdbDispatchCommand) finished. So when dispatching two-phase-DTM/DTX related command, QD doesn't know writer gang has gone, it may get unexpected error like 'savepoint not exist', 'subtransaction level not match', 'temp file not exist'. Previously, primaryWriterGang is not reset when DTM/DTX commands start even it is pointing to invalid segments, so those DTM/DTX commands will not actually sent to segments, an normal error reported on QD looks like 'could not connect to segment: initialization of segworker'. So we need a way to info global transaction that its writer gang has lost. so when aborting transaction, QD can: 1. disconnect all reader gangs, this is usefull to skip dispatching "ABORT_NO_PREPARE" 2. reset session and drop temp files because temp files in segment is gone. 3. report a error when dispatching "rollback savepoint" DTX because savepoint in segment is gone. 4. report a error when dispatch "abort subtransaction" DTX because subtransaction is rollback when writer segment is down.
-
- 14 8月, 2018 2 次提交
-
-
由 Pengzhou Tang 提交于
Previously, cdbdisp_finishCommand did three things: 1. cdbdisp_checkDispatchResult 2. cdbdisp_getDispatchResult 3. cdbdisp_destroyDispatcherState However, cdbdisp_finishCommand didn't make code cleaner or more convenient to use, in contrast, it makes error handling more difficult and makes code more complicated and inconsistent. This commit also reset estate->dispatcherState to NULL to avoid re-entry of cdbdisp_* functions.
-
由 Pengzhou Tang 提交于
Use cdbdisp_checkDispatchResult instead of CdbCheckDispatchResult to be consistent of cdbdisp_* functions.
-
- 09 5月, 2018 1 次提交
-
-
由 xiong-gang 提交于
Use resource owner to do the cleanup of dispatcher and interconnect(#4761)
-
- 01 3月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
If an error happens in the prepare phase of two-phase commit, relay the original error back to the client, instead of the fairly opaque "Abort [Prepared]' broadcast failed to one or more segments" message you got previously. A lot of things happen during the prepare phase that can legitimately fail, like checking deferred constraints, like in the 'constraints' regression test. But even without that, there can be triggers, ON COMMIT actions, etc., any of which can fail. This commit consists of several parts: * Pass 'true' for the 'raiseError' argument when dispatching the prepare dtx command in doPrepareTransaction(), so that the error is emitted to the client. * Bubble up an ErrorData struct, with as many fields intact as possible, to the caller, when dispatching a dtx command. (Instead of constructing a message in a StringInfo). So that we can re-throw the message to the client, with its original formatting. * Don't throw an error in performDtxProtocolCommand(), if we try to abort a prepared transaction that doesn't exist. That is business-as-usual, if a transaction throws an error before finishing the prepare phase. * Suppress the "NOTICE: Releasing segworker groups to retry broadcast." message, when aborting a prepared transaction. Put together, the effect is if an error happens during prepare phase, the client receives a message that is largely indistinguishable from the message you'd get if the same failure happened while running a normal statement. Fixes github issue #4530.
-
- 25 1月, 2018 1 次提交
-
-
由 Daniel Gustafsson 提交于
The errcode thrown in an ereport() on a segment was passed back to the dispatcher, but then dropped and replaced with a default errcode of ERRCODE_DATA_EXCEPTION. This works for most situations, but when trapping errors the exact errcode must be propagated. This extends the API to extract the errcode as well. The below case illustrates the previous issue: CREATE TABLE test1(id int primary key); CREATE TABLE test2(id int primary key); INSERT INTO test1 VALUES(1); INSERT INTO test2 VALUES(1); CREATE OR REPLACE FUNCTION merge_table() RETURNS void AS $$ DECLARE v_insert_sql varchar; BEGIN v_insert_sql :='INSERT INTO test1 SELECT * FROM test2'; EXECUTE v_insert_sql; EXCEPTION WHEN unique_violation THEN RAISE NOTICE 'unique_violation'; END; $$ LANGUAGE plpgsql volatile; SELECT merge_table();
-
- 02 11月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
Previously, if a segment reported an error after starting up the interconnect, it would take up to 250 ms for the main thread in the QD process to wake up and poll the dispatcher connections, and to see that there was an error. Shorten that time, by waking up immediately if the QD->QE libpq socket becomes readable while we're waiting for data to arrive in a Motion node. This isn't a complete solution, because this will only wake up if one arbitrarily chosen connection becomes readable, and we still rely on polling for the others. But this greatly speeds up many common scenarios. In particular, the "qp_functions_in_select" test now runs in under 5 s on my laptop, when it took about 60 seconds before.
-
- 01 9月, 2017 1 次提交
-
-
由 Daniel Gustafsson 提交于
This bumps the copyright years to the appropriate years after not having been updated for some time. Also reformats existing code headers to match the upstream style to ensure consistency.
-
- 14 11月, 2016 1 次提交
-
-
由 xiong-gang 提交于
pqFlush is sending data synchronously though the socket is set O_NONBLOCK, this incurs performance downgradation. This commit uses pqFlushNonBlocking instead, and synchronizes the completion of dispatching to all Gangs before query execution. Signed-off-by: Kenan Yao<kyao@pivotal.io>
-
- 04 11月, 2016 1 次提交
-
-
由 xiong-gang 提交于
Signed-off-by: NKenan Yao <kyao@pivotal.io>
-
- 13 9月, 2016 1 次提交
-
-
由 Pengzhou Tang 提交于
QD need to cancel QEs when 1) QD get a error 2) one or more QEs got error and cancelOnError was set to true. We want to cancel QEs as soon as possible once above conditions are reached, but considering the cost of cancelling QEs is high, we want to process as many pending finish QEs as possible before actually cancel. The original interval before cancelling is 2 seconds which is too long that users will see an obvious delay before errors are reported, this commit lower this interval to 100 ms to speed up the cancelling process.
-
- 25 7月, 2016 1 次提交
-
-
由 Pengzhou Tang 提交于
refactor CdbDispatchUtilityStatement() to make it flexible for cdbCopyStart(), dispatchVacuum() to call directly. Introduce flags like DF_NEED_TWO_SNAPSHOT, DF_WITH_SNAPSHOT, DF_CANCEL_ON_ERROR to make function call much clearer
-
- 17 7月, 2016 2 次提交
-
-
由 Gang Xiong 提交于
-
由 Gang Xiong 提交于
-
- 22 6月, 2016 1 次提交
-
-
由 Gang Xiong 提交于
-
- 19 5月, 2016 1 次提交
-
-
由 Pengzhou Tang 提交于
dispatcher/ directory This commit has no logic change, it just contains movement of code across files, to make dispatcher code clearer, and easier for unit testing. Signed-off-by: Kenan Yao
-
- 06 5月, 2016 1 次提交
-
-
由 Gang Xiong 提交于
refactor cdbdisp_dispatchToGang interface. refactor memory management in dispatch.
-
- 12 2月, 2016 1 次提交
-
-
由 Heikki Linnakangas 提交于
Remove unnecessary #includes, add #includes that are actually needed by some headers.
-
- 21 12月, 2015 1 次提交
-
-
由 Pengzhou Tang 提交于
SET command is session effective, all existed idle gangs should be set for later reuse, but for busy gangs declared by cursors, errors occur if they receive a set command. Way to fix it is marking busy gangs to no reuse so they can be destroyed after cursors been closed.
-
- 28 10月, 2015 1 次提交
-
-