- 28 9月, 2018 1 次提交
-
-
由 ZhangJackey 提交于
There was an assumption in gpdb that a table's data is always distributed on all segments, however this is not always true for example when a cluster is expanded from M segments to N (N > M) all the tables are still on M segments, to workaround the problem we used to have to alter all the hash distributed tables to randomly distributed to get correct query results, at the cost of bad performance. Now we support table data to be distributed on a subset of segments. A new columne `numsegments` is added to catalog table `gp_distribution_policy` to record how many segments a table's data is distributed on. By doing so we could allow DMLs on M tables, joins between M and N tables are also supported. ```sql -- t1 and t2 are both distributed on (c1, c2), -- one on 1 segments, the other on 2 segments select localoid::regclass, attrnums, policytype, numsegments from gp_distribution_policy; localoid | attrnums | policytype | numsegments ----------+----------+------------+------------- t1 | {1,2} | p | 1 t2 | {1,2} | p | 2 (2 rows) -- t1 and t1 have exactly the same distribution policy, -- join locally explain select * from t1 a join t1 b using (c1, c2); QUERY PLAN ------------------------------------------------ Gather Motion 1:1 (slice1; segments: 1) -> Hash Join Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2 -> Seq Scan on t1 a -> Hash -> Seq Scan on t1 b Optimizer: legacy query optimizer -- t1 and t2 are both distributed on (c1, c2), -- but as they have different numsegments, -- one has to be redistributed explain select * from t1 a join t2 b using (c1, c2); QUERY PLAN ------------------------------------------------------------------ Gather Motion 1:1 (slice2; segments: 1) -> Hash Join Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2 -> Seq Scan on t1 a -> Hash -> Redistribute Motion 2:1 (slice1; segments: 2) Hash Key: b.c1, b.c2 -> Seq Scan on t2 b Optimizer: legacy query optimizer ```
-
- 06 9月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
It used to always say "COPY 0", instead of the number of rows copied. This source line was added in PostgreSQL 9.0 (commit 8ddc05fb), but it was missed in the merge. Add a test case to check the command tags of different variants of COPY, including this one.
-
- 05 9月, 2018 1 次提交
-
-
由 Jim Doty 提交于
The "unknown" type has an attlen of -2, which signifies that the actual length is determined by strlen(). We weren't handling this case, so handle it now. Co-authored-by: NJacob Champion <pchampion@pivotal.io>
-
- 15 8月, 2018 1 次提交
-
-
由 xiong-gang 提交于
* Remove ERRCODE_GP_FEATURE_NOT_SUPPORTED and use ERRCODE_FEATURE_NOT_SUPPORTED instead * Remove ERROR_INVALID_WINDOW_FRAME_PARAMETER and use ERRCODE_WINDOWING_ERROR instead Co-authored-by: NAlexandra Wang <lewang@pivotal.io> Co-authored-by: NGang Xiong <gxiong@pivotal.io>
-
- 14 8月, 2018 1 次提交
-
-
由 Pengzhou Tang 提交于
Previously, COPY use CdbDispatchUtilityStatement directly to dispatch 'COPY' statements to all QEs and then send/receive data from primaryWriterGang, this way happens to work because primaryWriterGang is not recycled when a dispatcher state is destroyed. This seems nasty because the COPY command has finished logically. This commit splits the COPY dispatching logic to two parts to make it more reasonable.
-
- 03 8月, 2018 2 次提交
-
-
由 Daniel Gustafsson 提交于
The definitions of PQArgBlock in libpq.h and libpq-fe.h were in conflict with each other when including both. The definition in libpq.h was superfluous and removed in 23c7b583, so remove the redefines to clean up the code. Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
-
由 Karen Huddleston 提交于
This reverts commit 4750e1b6.
-
- 02 8月, 2018 1 次提交
-
-
由 Richard Guo 提交于
This is the final batch of commits from PostgreSQL 9.2 development, up to the point where the REL9_2_STABLE branch was created, and 9.3 development started on the PostgreSQL master branch. Notable upstream changes: * Index-only scan was included in the batch of upstream commits. It allows queries to retrieve data only from indexes, avoiding heap access. * Group commit was added to work effectively under heavy load. Previously, batching of commits became ineffective as the write workload increased, because of internal lock contention. * A new fast-path lock mechanism was added to reduce the overhead of taking and releasing certain types of locks which are taken and released very frequently but rarely conflict. * The new "parameterized path" mechanism was added. It allows inner index scans to use values from relations that are more than one join level up from the scan. This can greatly improve performance in situations where semantic restrictions (such as outer joins) limit the allowed join orderings. * SP-GiST (Space-Partitioned GiST) index access method was added to support unbalanced partitioned search structures. For suitable problems, SP-GiST can be faster than GiST in both index build time and search time. * Checkpoints now are performed by a dedicated background process. Formerly the background writer did both dirty-page writing and checkpointing. Separating this into two processes allows each goal to be accomplished more predictably. * Custom plan was supported for specific parameter values even when using prepared statements. * API for FDW was improved to provide multiple access "paths" for their tables, allowing more flexibility in join planning. * Security_barrier option was added for views to prevents optimizations that might allow view-protected data to be exposed to users. * Range data type was added to store a lower and upper bound belonging to its base data type. * CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The SELECT query is planned during the execution of the utility. To conform to this change, GPDB executes the utility statement only on QD and dispatches the plan of the SELECT query to QEs. Co-authored-by: NAdam Lee <ali@pivotal.io> Co-authored-by: NAlexandra Wang <lewang@pivotal.io> Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io> Co-authored-by: NAsim R P <apraveen@pivotal.io> Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io> Co-authored-by: NGang Xiong <gxiong@pivotal.io> Co-authored-by: NHaozhou Wang <hawang@pivotal.io> Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io> Co-authored-by: NJesse Zhang <sbjesse@gmail.com> Co-authored-by: NJinbao Chen <jinchen@pivotal.io> Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io> Co-authored-by: NMelanie Plageman <mplageman@pivotal.io> Co-authored-by: NPaul Guo <paulguo@gmail.com> Co-authored-by: NRichard Guo <guofenglinux@gmail.com> Co-authored-by: NShujie Zhang <shzhang@pivotal.io> Co-authored-by: NTaylor Vesely <tvesely@pivotal.io> Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
-
- 11 7月, 2018 1 次提交
-
-
由 Ashwin Agrawal 提交于
Pointers from Relation object needs to be handled with special care. As having refcount on the object doesn't mean the object is not modified. Incase of cache invalidation message handling Relation object gets *rebuild*. As part of rebuild only guarantee maintained is that Relation object address will not change. But the memory addresses inside the Relation object gets freed and freshly allocated and populated with latest data from catalog. For example below code sequence is dangerous rel->rd_cdbpolicy = original_policy; GpPolicyReplace(RelationGetRelid(rel), original_policy); If relcache invalidation message is served after assigning value to rd_cdbpolicy, the rebuild will free the memory for rd_cdbpolicy (which means original_policy) and replaced with current contents of gp_distribution_policy. So, when GpPolicyReplace() called with original_policy is going to access freed memory. Plus, rd_cdbpolicy will have stale value in cache and not intended refreshed value. This issue was hit in CI few times and reproduces with higher frequency with `-DRELCACHE_FORCE_RELEASE`. Hence this patch fixes all uses to rd_cdbpolicy to make use of rd_cdbpolicy pointer directly from Relation object and also to update the catalog first before assigning the value to rd_cdbpolicy.
-
- 10 7月, 2018 1 次提交
-
-
由 Daniel Gustafsson 提交于
Make sure to include all required header files to silence compilers that are picky about that.
-
- 04 7月, 2018 1 次提交
-
-
由 Adam Lee 提交于
The map was missed by mistake, all AO loading actions need it.
-
- 27 6月, 2018 1 次提交
-
-
由 Adam Lee 提交于
Unloading doesn't need it, checking the distribution policy neither.
-
- 19 6月, 2018 2 次提交
- 11 6月, 2018 1 次提交
-
-
由 Adam Lee 提交于
1, pass external table encoding to copy's options, then set cstate->file_encoding to it, for reading and writing. 2, after the merge, copy state doesn't have a member of client encoding, which used to set to the target encoding, get the converted data as a client, now passes the file encoding (from copy options) to convert directly.
-
- 25 5月, 2018 1 次提交
-
-
由 Jimmy Yih 提交于
The Postgres 9.1 merge introduced a problem where issuing a COPY FROM to a partition table could result in an unexpected error, "ERROR: extra data after last expected column", even though the input file was correct. This would happen if the partition table had partitions where the relnatts were not all the same (e.g. ALTER TABLE DROP COLUMN, ALTER TABLE ADD COLUMN, and then ALTER TABLE EXCHANGE PARTITION). The internal COPY logic would always use the COPY state's relation, the partition root, instead of the actual partition's relation to obtain the relnatts value. In fact, the only reason this is intermittently seen is because the COPY logic, when working on the leaf partition's relation that has a different relnatts value, was looking beyond a boolean array's allocated memory and got a phony value that would evaluate to TRUE. Co-authored-by: NJimmy Yih <jyih@pivotal.io> Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
-
- 17 5月, 2018 1 次提交
-
-
由 Adam Lee 提交于
Integer overflow occurs without this when copied more than 2^31 rows, under the `COPY ON SEGMENT` mode. Errors happen when it is casted to uint64, the type of `processed` in `CopyStateData`, third-party Postgres driver, which takes it as an int64, fails out of range.
-
- 08 5月, 2018 1 次提交
-
-
由 Adam Lee 提交于
ExecBRInsertTriggers() uses the per tuple memory context, which might be reset and pfree() SEGV.
-
- 29 3月, 2018 1 次提交
-
-
由 Pengzhou Tang 提交于
* Support replicated table in GPDB Currently, tables are distributed across all segments by hash or random in GPDB. There are requirements to introduce a new table type that all segments have the duplicate and full table data called replicated table. To implement it, we added a new distribution policy named POLICYTYPE_REPLICATED to mark a replicated table and added a new locus type named CdbLocusType_SegmentGeneral to specify the distribution of tuples of a replicated table. CdbLocusType_SegmentGeneral implies data is generally available on all segments but not available on qDisp, so plan node with this locus type can be flexibly planned to execute on either single QE or all QEs. it is similar with CdbLocusType_General, the only difference is that CdbLocusType_SegmentGeneral node can't be executed on qDisp. To guarantee this, we try our best to add a gather motion on the top of a CdbLocusType_SegmentGeneral node when planing motion for join even other rel has bottleneck locus type, a problem is such motion may be redundant if the single QE is not promoted to executed on qDisp finally, so we need to detect such case and omit the redundant motion at the end of apply_motion(). We don't reuse CdbLocusType_Replicated since it's always implies a broadcast motion bellow, it's not easy to plan such node as direct dispatch to avoid getting duplicate data. We don't support replicated table with inherit/partition by clause now, the main problem is that update/delete on multiple result relations can't work correctly now, we can fix this later. * Allow spi_* to access replicated table on QE Previously, GPDB didn't allow QE to access non-catalog table because the data is incomplete, we can remove this limitation now if it only accesses replicated table. One problem is QE need to know if a table is replicated table, previously, QE didn't maintain the gp_distribution_policy catalog, so we need to pass policy info to QE for replicated table. * Change schema of gp_distribution_policy to identify replicated table Previously, we used a magic number -128 in gp_distribution_policy table to identify replicated table which is quite a hack, so we add a new column in gp_distribution_policy to identify replicated table and partitioned table. This commit also abandon the old way that used 1-length-NULL list and 2-length-NULL list to identify DISTRIBUTED RANDOMLY and DISTRIBUTED FULLY clause. Beside, this commit refactor the code to make the decision-making of distribution policy more clear. * support COPY for replicated table * Disable row ctid unique path for replicated table. Previously, GPDB use a special Unique path on rowid to address queries like "x IN (subquery)", For example: select * from t1 where t1.c2 in (select c2 from t3), the plan looks like: -> HashAggregate Group By: t1.ctid, t1.gp_segment_id -> Hash Join Hash Cond: t2.c2 = t1.c2 -> Seq Scan on t2 -> Hash -> Seq Scan on t1 Obviously, the plan is wrong if t1 is a replicated table because ctid + gp_segment_id can't identify a tuple, in replicated table, a logical row may have different ctid and gp_segment_id. So we disable such plan for replicated table temporarily, it's not the best way because rowid unique way maybe the cheapest plan than normal hash semi join, so we left a FIXME for later optimization. * ORCA related fix Reported and added by Bhuvnesh Chaudhary <bchaudhary@pivotal.io> Fallback to legacy query optimizer for queries over replicated table * Adapt pg_dump/gpcheckcat to replicated table gp_distribution_policy is no longer a master-only catalog, do same check as other catalogs. * Support gpexpand on replicated table && alter the dist policy of replicated table
-
- 02 3月, 2018 1 次提交
-
-
由 Ming LI 提交于
Need to close pipe files even if COPY PROGRAME already exit before parent process ask child process to exit, otherwise it will lead file description leak in a long time running.
-
- 01 2月, 2018 1 次提交
-
-
由 Adam Lee 提交于
1, pipes might not exist while in close_program_pipes(), check it. For instance, relation doesn't exist, copy workflow fails before executing the program, "cstate->program_pipes->pid" dereferences NULL. 2, the program might be still running or hang when copy exits, kill it. Cases like the program hangs, doesn't take signals, user is trying to cancel. Since it's already the end of copy, and the program was started by copy, it should be safe to kill to clean up.
-
- 30 1月, 2018 2 次提交
-
-
由 Wang Hao 提交于
The hook is called for - each query Submit/Start/Finish/Abort/Error - each plan node, on executor Init/Start/Finish Author: Wang Hao <haowang@pivotal.io> Author: Zhang Teng <tezhang@pivotal.io>
-
由 Wang Hao 提交于
On postmaster start, additional space in Shmem is allocated for Instrumentation slots and a header. The number of slots is controlled by a cluster level GUC, default is 5MB (approximate 30K slots). The default number is estimated by 250 concurrent queries * 120 nodes per query. If the slots are exhausted, instruments are allocated in local memory as fallback. These slots are organized as a free list: - Header points to the first free slot. - Each free slot points to next free slot. - The last free slot's next pointer is NULL. ExecInitNode calls GpInstrAlloc to pick an empty slot from the free list: - The free slot pointed by the header is picked. - The picked slot's next pointer is assigned to the header. - A spin lock on the header to prevent concurrent writing. - When GUC gp_enable_query_metrics is off, Instrumentation will be allocated in local memory. Slots are recycled by resource owner callback function. Benchmark result with TPC-DS shows performance impact by this commit is less than 0.1% To improve performance of instrumenting, following optimizations are added: - Introduce instrument_option to skip CDB info collection - Optimize tuplecount in Instrumentation from double to uint64 - Replace instrument tuple entry/exit function with macro - Add need_timer to Instrumentation, to allow eliminating of timing overhead. This is porting part of upstream commit: ------------------------------------------------------------------------ commit af7914c6 Author: Robert Haas <rhaas@postgresql.org> Date: Tue Feb 7 11:23:04 2012 -0500 Add TIMING option to EXPLAIN, to allow eliminating of timing overhead. ------------------------------------------------------------------------ Author: Wang Hao <haowang@pivotal.io> Author: Zhang Teng <tezhang@pivotal.io>
-
- 28 12月, 2017 1 次提交
-
-
由 Adam Lee 提交于
There are two places that QD keep trying to get data, ignore SIGINT, and not send signal to QEs. If the program on segment has no input/output, copy command hangs. To fix it, this commit: 1, lets QD wait connections able to be read before PQgetResult(), and cancels queries if gets interrupt signals while waiting 2, sets DF_CANCEL_ON_ERROR when dispatch in cdbcopy.c 3, completes copy error handling -- prepare create table test(t text); copy test from program 'yes|head -n 655360'; -- could be canceled copy test from program 'sleep 100 && yes test'; copy test from program 'sleep 100 && yes test<SEGID>' on segment; copy test from program 'yes test'; copy test to '/dev/null'; copy test to program 'sleep 100 && yes test'; copy test to program 'sleep 100 && yes test<SEGID>' on segment; -- should fail copy test from program 'yes test<SEGID>' on segment; copy test to program 'sleep 0.1 && cat > /dev/nulls'; copy test to program 'sleep 0.1<SEGID> && cat > /dev/nulls' on segment;
-
- 12 12月, 2017 2 次提交
-
-
由 Daniel Gustafsson 提交于
These error codes were marked as deprecated in September 2007 but the code didn't get the memo. Extend the deprecation into the code and actually replace the usage. Ten years seems long enough notice so also remove the renames, the odds of anyone using these in code which compiles against a 6X tree should be low (and easily fixed).
-
由 Xiaoran Wang 提交于
commit: commit 95c238d9 Author: Andrew Dunstan <andrew@dunslane.net> Date: Sat Mar 8 01:16:26 2008 +0000 Improve efficiency of attribute scanning in CopyReadAttributesCSV. The loop is split into two parts, inside quotes, and outside quotes, saving some instructions in both parts. Author: Max Yang <myang@pivotal.io>
-
- 06 11月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
This is mostly in preparation for changes soon to be merged from PostgreSQL 8.4, commit a77eaa6a to be more precise. Currently GPDB's ExecInsert uses ExecSlotFetch*() functions to get the tuple from the slot, while in the upstream, it makes a modifiable copy with ExecMaterializeSlot(). That's OK as the code stands, because there's always a "junk filter" that ensures that the slot doesn't point directly to an on-disk tuple. But commit a77eaa6a will change that, so we have to start being more careful. This does fix an existing bug, namely that if you UPDATE an AO table with OIDs, the OIDs currently change (github issue #3732). Add a test case for that. More detailed breakdown of the changes: * In ExecInsert, create a writeable copy of the tuple when we're about to modify it, so that we don't accidentally modify an existing on-disk tuple. By calling ExecMaterializeSlot(). * In ExecInsert, track the OID of the tuple we're about to insert in a local variable, when we call the BEFORE ROW triggers, because we don't have a "tuple" yet. * Add ExecMaterializeSlot() function, like in the upstream, because we now need it in ExecInsert. Refactor ExecFetchSlotHeapTuple to use ExecMaterializeSlot(), like in upstream. * Cherry-pick bug fix commit 3d02cae3 from upstream. We would get that soon anyway as part of the merge, but we'll soon have test failures if we don't fix it immediately. * Change the API of appendonly_insert(), so that it takes the new OID as argument, instead of extracting it from the passed-in MemTuple. With this change, appendonly_insert() is guaranteed to not modify the passed-in MemTuple, so we don't need the equivalent of ExecMaterializeSlot() for MemTuples. * Also change the API of appendonly_insert() so that it returns the new OID of the inserted tuple, like heap_insert() does. Most callers ignore the return value, so this way they don't need to pass a dummy pointer argument. * Add test case for the case that a BEFORE ROW trigger sets the OID of a tuple we're about to insert. This is based on earlier patches against the 8.4 merge iteration3 branch by Jacob and Max.
-
- 04 11月, 2017 1 次提交
-
-
由 Abhijit Subramanya 提交于
The macro is taken from the upstream commit 40f908bd. This commit fixes issues for CLUSTER and COPY command where the commands would not generate necessary XLOG records when streaming replication is enabled. With the correct use of XLogIsNeeded() this is now fixed. This also cleans up the XLog_CanBypassWal() and XLog_UnconvertedCanBypassWal() functions by replacing their usage with XLogIsNeeded(). Signed-off-by: NTaylor Vesely <tvesely@pivotal.io> Signed-off-by: NAsim R P <apraveen@pivotal.io>
-
- 02 11月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
The code in ExecInsert, ExecUpdate, and CopyFrom was confused on what kind of a tuple the "tuple" variable might hold at different times. In particular, if you had a trigger on an append-only table, they would pass a MemTuple to Exec*Triggers() functions, which expect a HeapTuple. To fix, refactor the code so that it's always clear what kind of a tuple we're dealing with. The compiler will now throw warnings if they are conflated. We cannot, in fact, support ON UPDATE or ON DELETE triggers on AO tables in a sane way. GetTupleForTrigger() is hopelessly heap-specific. We could perhaps change it to do a lookup in the append only table's visibility map instead of looking at a heap tuple's xmin/xmax, but looking up the original tuple in an AO table would be fairly expensive anyway. As far as I can see, that never worked correctly, but let's add a check in CREATE TRIGGER to forbid that. ON INSERT triggers now work, also on AOCS tables. There was previously checks to throw an error if an AOCS table had a trigger, but I see no reason to forbid that particular case. Fixes github issue #3680.
-
- 11 10月, 2017 1 次提交
-
-
由 Xiaoran Wang 提交于
copy from on segment will not check distributing policy when the table is distributed randomly. Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>
-
- 25 9月, 2017 1 次提交
-
-
由 Adam Lee 提交于
Replace popen() with popen_with_stderr() which is used in external web table also to collect the stderr output of program. Since popen_with_stderr() forks a `sh` process, it's almost always sucessful, this commit catches errors happen in fwrite(). Also passes variables as the same as what external web table does. Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>
-
- 20 9月, 2017 1 次提交
-
-
由 Ming LI 提交于
Because we don't know the data location of the result of SELECT query, ON SEGMENT is forbidden.
-
- 15 9月, 2017 1 次提交
-
-
由 Ming LI 提交于
-
- 13 9月, 2017 1 次提交
-
-
由 Adam Lee 提交于
commit 3d009e45 Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> Date: Wed Feb 27 18:17:21 2013 +0200 Add support for piping COPY to/from an external program. This includes backend "COPY TO/FROM PROGRAM '...'" syntax, and corresponding psql \copy syntax. Like with reading/writing files, the backend version is superuser-only, and in the psql version, the program is run in the client. In the passing, the psql \copy STDIN/STDOUT syntax is subtly changed: if you the stdin/stdout is quoted, it's now interpreted as a filename. For example, "\copy foo from 'stdin'" now reads from a file called 'stdin', not from standard input. Before this, there was no way to specify a filename called stdin, stdout, pstdin or pstdout. This creates a new function in pgport, wait_result_to_str(), which can be used to convert the exit status of a process, as returned by wait(3), to a human-readable string. Etsuro Fujita, reviewed by Amit Kapila. Signed-off-by: NAdam Lee <ali@pivotal.io> Signed-off-by: NMing LI <mli@apache.org>
-
- 08 9月, 2017 2 次提交
-
-
由 Adam Lee 提交于
The other one is <SEG_DATA_DIR>, they should keep the same style.
- 04 9月, 2017 2 次提交
-
-
由 Xiaoran Wang 提交于
There are same codes computing target segment in both function CopyFrom and CopyFromDispatch. Extract the codes into separate functions. Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>
-
由 Daniel Gustafsson 提交于
-
- 01 9月, 2017 2 次提交
-
-
由 Daniel Gustafsson 提交于
This bumps the copyright years to the appropriate years after not having been updated for some time. Also reformats existing code headers to match the upstream style to ensure consistency.
-
由 Heikki Linnakangas 提交于
* Use ereport() with a proper error code, rather than elog(), so that you don't get the source file name and line number in the message, and the serious-looking backtrace in the log. * Remove the hint that advised "SET gp_enable_segment_copy_checking=off", when a row failed the check that it's being loaded to the correct segment. Ignoring the mismatch seems like very bad idea, because if your rows are in incorrect segments, all bets are off, and you'll likely get incorrect query results when you try to query the table.
-