- 07 12月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
-
- 29 10月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Most callers were passing CurrentMemoryContext, so this makes most callers slightly simpler. The few places that needed to pass a different context now switch to the correct one before calling the GpPolicy*() function. Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
-
- 19 10月, 2018 2 次提交
-
-
由 Heikki Linnakangas 提交于
If an error occurred in the segments, in a "COPY <table> TO <file>" command, the COPY was stopped, but the error was not reported to the user. That gave the false impression that it finished successfully, but what you actually got was an incomplete file. A test case is included. It uses a little helper output function that sometimes throws an error. Output functions are fairly unlikely to fail, but it could happen e.g. because of an out of memory error, or a disk failure. The "COPY (SELECT ...) TO <file>" variant did not suffer from this (otherwise, a query that throws an error would've been a much simpler way to test this.) The reason for this was that the code in cdbCopyGetData() that called PQgetResult(), and extracted the error message from the result, didn't indicate to the caller in any way that the error happened. To fix, delay the call to PQgetResult(), to a later call to cdbCopyEnd(). cdbCopyEnd() already had the logic to extract the error information from the PGresult, and throw it to the user. While we're at it, refactor cdbCopyEnd a little bit, to give the callers a nicer function signature. I also changed a few places that used 32-bit int to store rejected row counts, to use int64 instead. There was a FIXME comment about that. I didn't fix all the places that do that, though, so I moved the FIXME to one of the remaining places. Apply to master branch only. GPDB 5 didn't handle this too well, either; with the included test case, you got an error like this: postgres=# copy broken_type_test to '/tmp/x'; ERROR: missing error text That's not very nice, but at least you get an error, even if it's not a very good one. The code looks quite different in 5X_STABLE, so I'm not going to attempt improving that. Reviewed-by: NAdam Lee <ali@pivotal.io>
-
由 Heikki Linnakangas 提交于
* 'segdb_state' and 'err_context' fields in CdbCopy were unused, remove. * 'failedSegDBs' in processCopEndResults was unused, remove. * Plus some other cosmetic cleanup, for better readability.
-
- 28 9月, 2018 1 次提交
-
-
由 ZhangJackey 提交于
There was an assumption in gpdb that a table's data is always distributed on all segments, however this is not always true for example when a cluster is expanded from M segments to N (N > M) all the tables are still on M segments, to workaround the problem we used to have to alter all the hash distributed tables to randomly distributed to get correct query results, at the cost of bad performance. Now we support table data to be distributed on a subset of segments. A new columne `numsegments` is added to catalog table `gp_distribution_policy` to record how many segments a table's data is distributed on. By doing so we could allow DMLs on M tables, joins between M and N tables are also supported. ```sql -- t1 and t2 are both distributed on (c1, c2), -- one on 1 segments, the other on 2 segments select localoid::regclass, attrnums, policytype, numsegments from gp_distribution_policy; localoid | attrnums | policytype | numsegments ----------+----------+------------+------------- t1 | {1,2} | p | 1 t2 | {1,2} | p | 2 (2 rows) -- t1 and t1 have exactly the same distribution policy, -- join locally explain select * from t1 a join t1 b using (c1, c2); QUERY PLAN ------------------------------------------------ Gather Motion 1:1 (slice1; segments: 1) -> Hash Join Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2 -> Seq Scan on t1 a -> Hash -> Seq Scan on t1 b Optimizer: legacy query optimizer -- t1 and t2 are both distributed on (c1, c2), -- but as they have different numsegments, -- one has to be redistributed explain select * from t1 a join t2 b using (c1, c2); QUERY PLAN ------------------------------------------------------------------ Gather Motion 1:1 (slice2; segments: 1) -> Hash Join Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2 -> Seq Scan on t1 a -> Hash -> Redistribute Motion 2:1 (slice1; segments: 2) Hash Key: b.c1, b.c2 -> Seq Scan on t2 b Optimizer: legacy query optimizer ```
-
- 27 9月, 2018 1 次提交
-
-
由 Tang Pengzhou 提交于
* change type of db_descriptors to SegmentDatabaseDescriptor ** A new gang definination may consist of cached segdbDesc and new created segdbDesc, there is no need to palloc all segdbDesc struct as new. * Remove unnecessary allocate gang unit test * Manage idle segment dbs using CdbComponentDatabases instead of available* lists. To support vary size gang, we now need to manage segment dbs in a lower granularity, previously, idle QEs is managed by a bunch of lists like availablePrimaryWriterGang, availableReaderGangsN, this restrict dispatcher to only create N-size (N = number of segments) or 1-size gang. CdbComponentDatabases is a snapshot of segment components within current cluster, now it maintains a freelist for each segment component. When creating gang, dispatcher will make up a gang from each segment component (from freelist or create a new segment db). When cleaning up a gang, dispatcher will return idle segment dbs to each segment component. CdbComponentDatabases provide a few functions to manipulate segment dbs (SegmentDatabaseDescriptor *): * cdbcomponent_getCdbComponents * cdbcomponent_destroyCdbComponents * cdbcomponent_allocateIdleSegdb * cdbcomponent_recycleIdleSegdb * cdbcomponent_cleanupIdleSegdbs CdbComponentDatabases is also FTS version sensitive, so once a FTS version changed, CdbComponentDatabases destroy all idle segment dbs and allocate QEs in the new promoted segment. This provides the ability to transparent mirror failover to users. Since segment dbs(SegmentDatabaseDescriptor *) are managed by CdbComponentDatabases now, we can simplify the memory context management by replacing GangContext & perGangContext with DispatcherContext & CdbComponentsContext. * Postpone the error hanlding when creating gang Now we have AtAbort_DispatcherState, one advantage of it is that we can postpone gang error hanlding in this function and make code cleaner. * Handle FTS version change correctly In some cases, when a FTS version changed, we can't update current snapshot of segment components, to be more specifically, we can't destroy current writer segment dbs and create new segment dbs. These cases include: * session has temp table created. * query need two-phase commit and gxid has been dispatched to segments. * Replace <gangId, sliceId> map with <qeIdentifier, sliceId> map We used to dispatch a <gangId, sliceId> map along with query to segment dbs so segment dbs can know which slice they should execute. Now gangId is useless for a segment db because a segment db can be reused by different gang, so we need a new way to tell the info to segment dbs. To resolve this, CdbComponentDatabases assign a unique identifier to each segment db and make up a bitmap set which consist of segment identifiers for each slice, segment dbs then can go through the slice table and find the right slice to execute. * Allow dispatcher to create vary size gang and refine AssignGangs() Previously, dispatcher can only create N-size gang for GANGTYPE_PRIMARY_WRITER or GANGTYPE_PRIMARY_READER. this restrict dispatcher in many ways, one example is direct dispatch, it always create a N-size gang even it only dispatch the command to one segment, another example is some operations may be able to use N+ size gang, like hash join, if both inner and outer plan is redistributed, the hash join node can associate with a N+ size gang to execute. This commit changes the API of createGang() so the caller can specify a list of segments (partial or even duplicate segments), CdbCompoentDatabase will guarantee each segment has only one writer in a session. With this it also resolves another pain point of AssignGangs(), so the caller don't need to promote a GANGTYPE_PRIMARY_READER to GANGTYPE_PRIMARY_WRITER, or promote a GANGTYPE_SINGLETON _READER to GANGTYPE_PRIMARY_WRITER for replicated table (see FinalizeSliceTree()). With this commit, AssignGang() is very clear now.
-
- 23 9月, 2018 1 次提交
-
-
由 Daniel Gustafsson 提交于
There is already an assertion in getgpsegmentCount() testing the count to be > 0 (and 0 can only be returned in utility mode which still holds this assertion always true). Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io> Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
-
- 21 9月, 2018 1 次提交
-
-
由 Adam Lee 提交于
It happens if the copy command errors out before assigning dispatcherState. Initialize the dispatcherState as NULL to fix it, palloc0() to avoid future new member issues. 5X has no such problem. ``` (gdb) c Continuing. Detaching after fork from child process 25843. Program received signal SIGSEGV, Segmentation fault. 0x0000000000aa04dd in getCdbCopyPrimaryGang (c=0x23d4150) at cdbcopy.c:44 44 return (Gang *)linitial(c->dispatcherState->allocatedGangs); (gdb) bt \#0 0x0000000000aa04dd in getCdbCopyPrimaryGang (c=0x23d4150) at cdbcopy.c:44 \#1 0x0000000000aa12d8 in cdbCopyEndAndFetchRejectNum (c=0x23d4150, total_rows_completed=0x0, abort_msg=0xd0c8f8 "aborting COPY in QE due to error in QD") at cdbcopy.c:642 \#... (gdb) p c->dispatcherState $1 = (struct CdbDispatcherState *) 0x100000000 ```
-
- 06 9月, 2018 1 次提交
-
-
由 Tang Pengzhou 提交于
* Simplify the AssignGangs() logic for init plans Previously, AssignGangs() assign gangs for both main plans and init plans in one shot. Because init plans and main plan are executed sequentially, so the gangs can be reused between main plan and init plans, function AccumSliceReq() is designed for this. This process can be simplified: already know the root slice index id will be adjusted to according init plan id, init plan only need to assign their own slices. * Integrate Gang management from portal to Dispatcher Previously, Gang was managed by portal, freeGangsForPortal() was used to cleanup gang resource, DTM related commands also needed a gang to dispatch command outside of a portal and used freeGangsForPortal() too. There might be multiple command/plan/utility executed within one portal, all commands relied on a dispatcher routine like CdbDispatchCommand / CdbDispatchPlan/CdbDispatchUtility... to dispatch, gangs were created by each dispatcher routines, but not be recycled or destroyed when a routine finished except for primary writer gang, one defect of this is gang resource cannot be reused between dispatcher routines. GPDB already had an optimization for init plans, if a plan contained init plans, AssignGangs was called before execution of any of them it went through the whole slice tree and created the maximum gang that both main plan and init plans needed, this was doable because init plans and main plan were executed sequentially, but it also made AssignGangs logic complex, meanwhile, reusing an not clean gang was not safe. Another confusing thing was the gang and dispatcher were managed separately which cause context inconsistent like: when a dispatcher state was destroyed, gang was not recycled, when a gang was destroyed by portal, the dispatcher state was still in use and may refer to the context of a destroyed gang. As described above, this commit integrates gang management with dispatcher, a dispather state is responsible for creating and tracking gangs as needed and destroy them when dispatcher state is destroyed. * Handle the case when primary writer gang has gone When members of primary writer gang gone, the writer gang is destroyed immediately (primaryWriterGang is set to NULL) when a dispatcher rountine (eg.CdbDispatchCommand) finished. So when dispatching two-phase-DTM/DTX related command, QD doesn't know writer gang has gone, it may get unexpected error like 'savepoint not exist', 'subtransaction level not match', 'temp file not exist'. Previously, primaryWriterGang is not reset when DTM/DTX commands start even it is pointing to invalid segments, so those DTM/DTX commands will not actually sent to segments, an normal error reported on QD looks like 'could not connect to segment: initialization of segworker'. So we need a way to info global transaction that its writer gang has lost. so when aborting transaction, QD can: 1. disconnect all reader gangs, this is usefull to skip dispatching "ABORT_NO_PREPARE" 2. reset session and drop temp files because temp files in segment is gone. 3. report a error when dispatching "rollback savepoint" DTX because savepoint in segment is gone. 4. report a error when dispatch "abort subtransaction" DTX because subtransaction is rollback when writer segment is down.
-
- 14 8月, 2018 1 次提交
-
-
由 Pengzhou Tang 提交于
Previously, COPY use CdbDispatchUtilityStatement directly to dispatch 'COPY' statements to all QEs and then send/receive data from primaryWriterGang, this way happens to work because primaryWriterGang is not recycled when a dispatcher state is destroyed. This seems nasty because the COPY command has finished logically. This commit splits the COPY dispatching logic to two parts to make it more reasonable.
-
- 03 8月, 2018 1 次提交
-
-
由 Daniel Gustafsson 提交于
The recent COPY refactoring left cdbCopyEnd() unused, so remove the function as it's now dead code. Also clean up a comment which erroneously was referring to it. Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
-
- 17 5月, 2018 1 次提交
-
-
由 Adam Lee 提交于
Integer overflow occurs without this when copied more than 2^31 rows, under the `COPY ON SEGMENT` mode. Errors happen when it is casted to uint64, the type of `processed` in `CopyStateData`, third-party Postgres driver, which takes it as an int64, fails out of range.
-
- 29 3月, 2018 1 次提交
-
-
由 Pengzhou Tang 提交于
* Support replicated table in GPDB Currently, tables are distributed across all segments by hash or random in GPDB. There are requirements to introduce a new table type that all segments have the duplicate and full table data called replicated table. To implement it, we added a new distribution policy named POLICYTYPE_REPLICATED to mark a replicated table and added a new locus type named CdbLocusType_SegmentGeneral to specify the distribution of tuples of a replicated table. CdbLocusType_SegmentGeneral implies data is generally available on all segments but not available on qDisp, so plan node with this locus type can be flexibly planned to execute on either single QE or all QEs. it is similar with CdbLocusType_General, the only difference is that CdbLocusType_SegmentGeneral node can't be executed on qDisp. To guarantee this, we try our best to add a gather motion on the top of a CdbLocusType_SegmentGeneral node when planing motion for join even other rel has bottleneck locus type, a problem is such motion may be redundant if the single QE is not promoted to executed on qDisp finally, so we need to detect such case and omit the redundant motion at the end of apply_motion(). We don't reuse CdbLocusType_Replicated since it's always implies a broadcast motion bellow, it's not easy to plan such node as direct dispatch to avoid getting duplicate data. We don't support replicated table with inherit/partition by clause now, the main problem is that update/delete on multiple result relations can't work correctly now, we can fix this later. * Allow spi_* to access replicated table on QE Previously, GPDB didn't allow QE to access non-catalog table because the data is incomplete, we can remove this limitation now if it only accesses replicated table. One problem is QE need to know if a table is replicated table, previously, QE didn't maintain the gp_distribution_policy catalog, so we need to pass policy info to QE for replicated table. * Change schema of gp_distribution_policy to identify replicated table Previously, we used a magic number -128 in gp_distribution_policy table to identify replicated table which is quite a hack, so we add a new column in gp_distribution_policy to identify replicated table and partitioned table. This commit also abandon the old way that used 1-length-NULL list and 2-length-NULL list to identify DISTRIBUTED RANDOMLY and DISTRIBUTED FULLY clause. Beside, this commit refactor the code to make the decision-making of distribution policy more clear. * support COPY for replicated table * Disable row ctid unique path for replicated table. Previously, GPDB use a special Unique path on rowid to address queries like "x IN (subquery)", For example: select * from t1 where t1.c2 in (select c2 from t3), the plan looks like: -> HashAggregate Group By: t1.ctid, t1.gp_segment_id -> Hash Join Hash Cond: t2.c2 = t1.c2 -> Seq Scan on t2 -> Hash -> Seq Scan on t1 Obviously, the plan is wrong if t1 is a replicated table because ctid + gp_segment_id can't identify a tuple, in replicated table, a logical row may have different ctid and gp_segment_id. So we disable such plan for replicated table temporarily, it's not the best way because rowid unique way maybe the cheapest plan than normal hash semi join, so we left a FIXME for later optimization. * ORCA related fix Reported and added by Bhuvnesh Chaudhary <bchaudhary@pivotal.io> Fallback to legacy query optimizer for queries over replicated table * Adapt pg_dump/gpcheckcat to replicated table gp_distribution_policy is no longer a master-only catalog, do same check as other catalogs. * Support gpexpand on replicated table && alter the dist policy of replicated table
-
- 28 3月, 2018 1 次提交
-
-
由 Asim R P 提交于
The command "COPY enumtest FROM stdin;" hit an infinite loop on merge branch. Code indicates that the issue can happen on master as well. QD backend went into infinite loop when the connection was already closed from QE end. The TCP connection was in CLOSE_WAIT state. Libpq connection status was CONNECTION_BAD and asyncStatus was PGASYNC_BUSY. Fix the infinite loop by checking libpq connection status in each iteration.
-
- 23 3月, 2018 1 次提交
-
-
由 Ashwin Agrawal 提交于
-
- 28 12月, 2017 1 次提交
-
-
由 Adam Lee 提交于
There are two places that QD keep trying to get data, ignore SIGINT, and not send signal to QEs. If the program on segment has no input/output, copy command hangs. To fix it, this commit: 1, lets QD wait connections able to be read before PQgetResult(), and cancels queries if gets interrupt signals while waiting 2, sets DF_CANCEL_ON_ERROR when dispatch in cdbcopy.c 3, completes copy error handling -- prepare create table test(t text); copy test from program 'yes|head -n 655360'; -- could be canceled copy test from program 'sleep 100 && yes test'; copy test from program 'sleep 100 && yes test<SEGID>' on segment; copy test from program 'yes test'; copy test to '/dev/null'; copy test to program 'sleep 100 && yes test'; copy test to program 'sleep 100 && yes test<SEGID>' on segment; -- should fail copy test from program 'yes test<SEGID>' on segment; copy test to program 'sleep 0.1 && cat > /dev/nulls'; copy test to program 'sleep 0.1<SEGID> && cat > /dev/nulls' on segment;
-
- 30 10月, 2017 1 次提交
-
-
由 Adam Lee 提交于
Signed-off-by: NAdam Lee <ali@pivotal.io>
-
- 10 10月, 2017 1 次提交
-
-
由 Ashwin Agrawal 提交于
-
- 25 9月, 2017 1 次提交
-
-
由 Adam Lee 提交于
Replace popen() with popen_with_stderr() which is used in external web table also to collect the stderr output of program. Since popen_with_stderr() forks a `sh` process, it's almost always sucessful, this commit catches errors happen in fwrite(). Also passes variables as the same as what external web table does. Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>
-
- 01 9月, 2017 1 次提交
-
-
由 Daniel Gustafsson 提交于
This bumps the copyright years to the appropriate years after not having been updated for some time. Also reformats existing code headers to match the upstream style to ensure consistency.
-
- 28 8月, 2017 2 次提交
-
-
由 Adam Lee 提交于
Don't send nonsense '\n' characters just for counting, let segments report how many rows are processed instead. Signed-off-by: NMing LI <mli@apache.org>
-
由 Xiaoran Wang 提交于
When use command `COPY FROM ON SEGMENT`, we copy data from local file to the table on the segment directly. When copying data, we need to apply the distribution policy on the record to compute the target segment. If the target segment ID isn't equal to current segment ID, we will report error to keep the distribution key restriction. Because the segment has no meta data info about table distribution policy and partition policy,we copy the distribution policy of main table from master to segment in the query plan. When the parent table and partitioned sub table has different distribution policy, it is difficult to check all the distribution key restriction in all sub tables. In this case , we will report error. In case of the partitioned table's distribution policy is RANDOMLY and different from the parent table, user can use GUC value `gp_enable_segment_copy_checking` to disable this check. Check the distribution key restriction as follows: 1) Table isn't partioned: Compute the data target segment.If the data doesn't belong the segment, will report error. 2) Table is partitioned and the distribution policy of partitioned table as same as the main table: Compute the data target segment.If the data doesn't belong the segment, will report error. 3) Table is partitioned and the distribution policy of partitioned table is different from main table: Not support to check ,report error. Signed-off-by: NXiaoran Wang <xiwang@pivotal.io> Signed-off-by: NMing LI <mli@apache.org> Signed-off-by: NAdam Lee <ali@pivotal.io>
-
- 11 8月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
-
- 09 8月, 2017 1 次提交
-
-
由 Pengzhou Tang 提交于
cf7cddf7 has conflict with cc38f526, struct PQExpBufferData is needed by structure SegmentDatabaseDescriptor, so bring gp-libpq-int.h back
-
- 08 8月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
StringInfo is more appropriate in backend code. (Unless the buffer needs to be used in a thread.) In the passing, rename the 'conn' static variable in cdbfilerepconnclient.c. It seemed overly generic.
-
- 31 7月, 2017 1 次提交
-
-
由 Ming LI 提交于
Support COPY statement that imports the data file on segments directly parallel. It could be used to import data files generated by "COPY ... to ... ON SEGMENT'. This commit also supports all kinds of data file formats which "COPY ... TO" supports, processes reject limit numbers and logs errors accordingly. Key workflow: a) For COPY FROM, nothing changed by this commit, dispatch modified COPY command to segments at first, then read data file on master, and dispatch the data to relevant segment to process. b) For COPY FROM ON SEGMENT, on QD, read dummy data file, other parts keep unchanged, on QE, process the data stream (empty) dispatched from QD at first, then re-do the same workflow to read and process the local segment data file. Signed-off-by: NMing LI <mli@pivotal.io> Signed-off-by: NAdam Lee <ali@pivotal.io> Signed-off-by: NHaozhou Wang <hawang@pivotal.io> Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>
-
- 20 1月, 2017 1 次提交
-
-
由 alldefector 提交于
Binary COPY was previously disabled in Greenplum, this commit re-enables the binary mode by incorporating the upstream code from PostgreSQL. Patch by Github user alldefector with additional hacking by Daniel Gustafsson
-
- 24 11月, 2016 1 次提交
-
-
由 Daniel Gustafsson 提交于
The COPY process isn't finished until PQgetCopyData() returns -1 so we must consume all rows sent until we get -1. A single call should in most cases do the trick but there are no guarantees so loop around consumption until end-of-COPY is signalled. In case of error, break out and continue operation to keep current flow, investigating the error is probably warranted but handling cases that might fall out is a bigger question that this isolated patch. Also use PQfreemem() for the buffer per the manual. This is the same as just free() on Linux/UNIX (while critically different on Windows) but we might as well follow the set API to reduce confusion. Original report by Coverity
-
- 07 11月, 2016 1 次提交
-
-
由 Heikki Linnakangas 提交于
Instead of carrying a "new OID" field in all the structs that represent CREATE statements, introduce a generic mechanism for capturing the OIDs of all created objects, dispatching them to the QEs, and using those same OIDs when the corresponding objects are created in the QEs. This allows removing a lot of scattered changes in DDL command handling, that was previously needed to ensure that objects are assigned the same OIDs in all the nodes. This also provides the groundwork for pg_upgrade to dictate the OIDs to use for upgraded objects. The upstream has mechanisms for pg_upgrade to dictate the OIDs for a few objects (relations and types, at least), but in GPDB, we need to preserve the OIDs of almost all object types.
-
- 04 11月, 2016 1 次提交
-
-
由 xiong-gang 提交于
Signed-off-by: NKenan Yao <kyao@pivotal.io>
-
- 18 8月, 2016 1 次提交
-
-
由 Heikki Linnakangas 提交于
I found these with "callcatcher", written by Caolán McNamara. Many thanks for the tool! See https://www.skynet.ie/~caolan/Packages/callcatcher.html
-
- 25 7月, 2016 1 次提交
-
-
由 Pengzhou Tang 提交于
refactor CdbDispatchUtilityStatement() to make it flexible for cdbCopyStart(), dispatchVacuum() to call directly. Introduce flags like DF_NEED_TWO_SNAPSHOT, DF_WITH_SNAPSHOT, DF_CANCEL_ON_ERROR to make function call much clearer
-
- 28 6月, 2016 1 次提交
-
-
由 Kenan Yao 提交于
-
- 19 5月, 2016 1 次提交
-
-
由 Pengzhou Tang 提交于
dispatcher/ directory This commit has no logic change, it just contains movement of code across files, to make dispatcher code clearer, and easier for unit testing. Signed-off-by: Kenan Yao
-
- 28 10月, 2015 1 次提交
-
-